pyFTS.data package

Module contents

Module for pyFTS standard datasets facilities

Submodules

pyFTS.data.common module

pyFTS.data.common.get_dataframe(filename: str, url: str, sep: str = ';', compression: str = 'infer') → pandas.core.frame.DataFrame[source]

This method check if filename already exists, read the file and return its data. If the file don’t already exists, it will be downloaded and decompressed.

Parameters:
  • filename – dataset local filename
  • url – dataset internet URL
  • sep – CSV field separator
  • compression – type of compression
Returns:

Pandas dataset

Datasets

Artificial and synthetic data generators

Facilities to generate synthetic stochastic processes

class pyFTS.data.artificial.SignalEmulator(**kwargs)[source]

Bases: object

Emulate a complex signal built from several additive and non-additive components

blip(**kwargs)[source]

Creates an outlier greater than the maximum or lower then the minimum previous values of the signal, and insert it on a random location of the signal.

Returns:the current SignalEmulator instance, for method chaining
components = None

Components of the signal

incremental_gaussian(mu: float, sigma: float, **kwargs)[source]

Creates an additive gaussian interference on a previous signal

Parameters:
  • mu – increment on mean
  • sigma – increment on variance
  • start – lag index to start this signal, the default value is 0
  • it – Number of iterations, the default value is 1
  • length – Number of samples generated on each iteration, the default value is 100
  • vmin – Lower bound value of generated data, the default value is None
  • vmax – Upper bound value of generated data, the default value is None
Returns:

the current SignalEmulator instance, for method chaining

periodic_gaussian(type: str, period: int, mu_min: float, sigma_min: float, mu_max: float, sigma_max: float, **kwargs)[source]

Creates an additive periodic gaussian interference on a previous signal

Parameters:
  • type – ‘linear’ or ‘sinoidal’
  • period – the period of recurrence
  • mu – increment on mean
  • sigma – increment on variance
  • start – lag index to start this signal, the default value is 0
  • it – Number of iterations, the default value is 1
  • length – Number of samples generated on each iteration, the default value is 100
  • vmin – Lower bound value of generated data, the default value is None
  • vmax – Upper bound value of generated data, the default value is None
Returns:

the current SignalEmulator instance, for method chaining

run()[source]

Render the signal

Returns:a list of float values
stationary_gaussian(mu: float, sigma: float, **kwargs)[source]

Creates a continuous Gaussian signal with mean mu and variance sigma.

Parameters:
  • mu – mean
  • sigma – variance
  • additive – If False it cancels the previous signal and start this one, if True this signal is added to the previous one
  • start – lag index to start this signal, the default value is 0
  • it – Number of iterations, the default value is 1
  • length – Number of samples generated on each iteration, the default value is 100
  • vmin – Lower bound value of generated data, the default value is None
  • vmax – Upper bound value of generated data, the default value is None
Returns:

the current SignalEmulator instance, for method chaining

pyFTS.data.artificial.generate_gaussian_linear(mu_ini, sigma_ini, mu_inc, sigma_inc, it=100, num=10, vmin=None, vmax=None)[source]

Generate data sampled from Gaussian distribution, with constant or linear changing parameters

Parameters:
  • mu_ini – Initial mean
  • sigma_ini – Initial variance
  • mu_inc – Mean increment after ‘num’ samples
  • sigma_inc – Variance increment after ‘num’ samples
  • it – Number of iterations
  • num – Number of samples generated on each iteration
  • vmin – Lower bound value of generated data
  • vmax – Upper bound value of generated data
Returns:

A list of it*num float values

pyFTS.data.artificial.generate_linear_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]

Generates a periodic linear variation on mean and variance

Parameters:
  • period – the period of recurrence
  • mu_min – initial (and minimum) mean of each period
  • sigma_min – initial (and minimum) variance of each period
  • mu_max – final (and maximum) mean of each period
  • sigma_max – final (and maximum) variance of each period
  • it – Number of iterations
  • num – Number of samples generated on each iteration
  • vmin – Lower bound value of generated data
  • vmax – Upper bound value of generated data
Returns:

A list of it*num float values

pyFTS.data.artificial.generate_sinoidal_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]

Generates a periodic sinoidal variation on mean and variance

Parameters:
  • period – the period of recurrence
  • mu_min – initial (and minimum) mean of each period
  • sigma_min – initial (and minimum) variance of each period
  • mu_max – final (and maximum) mean of each period
  • sigma_max – final (and maximum) variance of each period
  • it – Number of iterations
  • num – Number of samples generated on each iteration
  • vmin – Lower bound value of generated data
  • vmax – Upper bound value of generated data
Returns:

A list of it*num float values

pyFTS.data.artificial.generate_uniform_linear(min_ini, max_ini, min_inc, max_inc, it=100, num=10, vmin=None, vmax=None)[source]

Generate data sampled from Uniform distribution, with constant or linear changing bounds

Parameters:
  • mu_ini – Initial mean
  • sigma_ini – Initial variance
  • mu_inc – Mean increment after ‘num’ samples
  • sigma_inc – Variance increment after ‘num’ samples
  • it – Number of iterations
  • num – Number of samples generated on each iteration
  • vmin – Lower bound value of generated data
  • vmax – Upper bound value of generated data
Returns:

A list of it*num float values

pyFTS.data.artificial.random_walk(n=500, type='gaussian')[source]

Simple random walk

Parameters:
  • n – number of samples
  • type – ‘gaussian’ or ‘uniform’
Returns:

pyFTS.data.artificial.white_noise(n=500)[source]

Simple Gaussian noise signal :param n: number of samples :return:

AirPassengers dataset

Monthly totals of a airline passengers from USA, from January 1949 through December 1960.

Source: Hyndman, R.J., Time Series Data Library, http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/.

pyFTS.data.AirPassengers.get_data() → numpy.ndarray[source]

Get a simple univariate time series data.

Returns:numpy array
pyFTS.data.AirPassengers.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

Bitcoin dataset

Bitcoin to USD quotations

Daily averaged index, by business day, from 2010 to 2018.

Source: https://finance.yahoo.com/quote/BTC-USD?p=BTC-USD

pyFTS.data.Bitcoin.get_data(field: str = 'AVG') → numpy.ndarray[source]

Get the univariate time series data.

Parameters:field – dataset field to load
Returns:numpy array
pyFTS.data.Bitcoin.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

DowJones dataset

DJI - Dow Jones

Daily averaged index, by business day, from 1985 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.DowJones.get_data(field: str = 'AVG') → numpy.ndarray[source]

Get the univariate time series data.

Parameters:field – dataset field to load
Returns:numpy array
pyFTS.data.DowJones.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

Enrollments dataset

Yearly University of Alabama enrollments from 1971 to 1992.

pyFTS.data.Enrollments.get_data() → numpy.ndarray[source]

Get a simple univariate time series data.

Returns:numpy array
pyFTS.data.Enrollments.get_dataframe() → pandas.core.frame.DataFrame[source]

Ethereum dataset

Ethereum to USD quotations

Daily averaged index, by business day, from 2016 to 2018.

Source: https://finance.yahoo.com/quote/ETH-USD?p=ETH-USD

pyFTS.data.Ethereum.get_data(field: str = 'AVG') → numpy.ndarray[source]

Get the univariate time series data.

Parameters:field – dataset field to load
Returns:numpy array
pyFTS.data.Ethereum.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

EUR-GBP dataset

FOREX market EUR-GBP pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURGBP.get_data(field: str = 'avg') → numpy.ndarray[source]

Get the univariate time series data.

Parameters:field – dataset field to load
Returns:numpy array
pyFTS.data.EURGBP.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

EUR-USD dataset

FOREX market EUR-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURUSD.get_data(field: str = 'avg') → numpy.ndarray[source]

Get the univariate time series data.

Parameters:field – dataset field to load
Returns:numpy array
pyFTS.data.EURUSD.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

GBP-USD dataset

FOREX market GBP-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.GBPUSD.get_data(field: str = 'avg') → numpy.ndarray[source]

Get the univariate time series data.

Parameters:field – dataset field to load
Returns:numpy array
pyFTS.data.GBPUSD.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

INMET dataset

INMET - Instituto Nacional Meteorologia / Brasil

Belo Horizonte station, from 2000-01-01 to 31/12/2012

Source: http://www.inmet.gov.br

pyFTS.data.INMET.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

Malaysia dataset

Hourly Malaysia eletric load and tempeature

pyFTS.data.Malaysia.get_data(field: str = 'load') → numpy.ndarray[source]

Get the univariate time series data.

Parameters:field – dataset field to load
Returns:numpy array
pyFTS.data.Malaysia.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

NASDAQ module

National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ IXIC)

Daily averaged index by business day, from 2000 to 2016.

Source: http://www.nasdaq.com/aspx/flashquotes.aspx?symbol=IXIC&selected=IXIC

pyFTS.data.NASDAQ.get_data(field: str = 'avg') → numpy.ndarray[source]

Get a simple univariate time series data.

Parameters:field – the dataset field name to extract
Returns:numpy array
pyFTS.data.NASDAQ.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

SONDA dataset

SONDA - Sistema de Organização Nacional de Dados Ambientais, from INPE - Instituto Nacional de Pesquisas Espaciais, Brasil.

Brasilia station

Source: http://sonda.ccst.inpe.br/

pyFTS.data.SONDA.get_data(field: str) → numpy.ndarray[source]

Get a simple univariate time series data.

Parameters:field – the dataset field name to extract
Returns:numpy array
pyFTS.data.SONDA.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

S&P 500 dataset

S&P500 - Standard & Poor’s 500

Daily averaged index, by business day, from 1950 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.SP500.get_data() → numpy.ndarray[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.SP500.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

TAIEX dataset

The Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)

Daily averaged index by business day, from 1995 to 2014.

Source: http://www.twse.com.tw/en/products/indices/Index_Series.php

pyFTS.data.TAIEX.get_data() → numpy.ndarray[source]

Get the univariate time series data.

Returns:numpy array
pyFTS.data.TAIEX.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame

Henon chaotic time series

  1. Hénon. “A two-dimensional mapping with a strange attractor”. Commun. Math. Phys. 50, 69-77 (1976)

dx/dt = a + by(t-1) - x(t-1)^2 dy/dt = x

pyFTS.data.henon.get_data(var: str, a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) → pandas.core.frame.DataFrame[source]

Get a simple univariate time series data.

Parameters:var – the dataset field name to extract
Returns:numpy array
pyFTS.data.henon.get_dataframe(a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) → pandas.core.frame.DataFrame[source]

Return a dataframe with the bivariate Henon Map time series (x, y).

Parameters:
  • a – Equation coefficient
  • b – Equation coefficient
  • initial_values – numpy array with the initial values of x and y. Default: [1, 1]
  • iterations – number of iterations. Default: 1000
Returns:

Panda dataframe with the x and y values

Logistic_map chaotic time series

May, Robert M. (1976). “Simple mathematical models with very complicated dynamics”. Nature. 261 (5560): 459–467. doi:10.1038/261459a0.

x(t) = r * x(t-1) * (1 - x(t -1) )

pyFTS.data.logistic_map.get_data(r: float = 4, initial_value: float = 0.3, iterations: int = 100) → list[source]

Return a list with the logistic map chaotic time series.

Parameters:
  • r – Equation coefficient
  • initial_value – Initial value of x. Default: 0.3
  • iterations – number of iterations. Default: 100
Returns:

Lorentz chaotic time series

Lorenz, Edward Norton (1963). “Deterministic nonperiodic flow”. Journal of the Atmospheric Sciences. 20 (2): 130–141. https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2

dx/dt = a(y -x) dy/dt = x(b - z) - y dz/dt = xy - cz

pyFTS.data.lorentz.get_data(var: str, a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) → pandas.core.frame.DataFrame[source]

Get a simple univariate time series data.

Parameters:var – the dataset field name to extract
Returns:numpy array
pyFTS.data.lorentz.get_dataframe(a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) → pandas.core.frame.DataFrame[source]

Return a dataframe with the multivariate Lorenz Map time series (x, y, z).

Parameters:
  • a – Equation coefficient. Default value: 10
  • b – Equation coefficient. Default value: 28
  • c – Equation coefficient. Default value: 8.0/3.0
  • dt – Time differential for continuous time integration. Default value: 0.01
  • initial_values – numpy array with the initial values of x,y and z. Default: [0.1, 0, 0]
  • iterations – number of iterations. Default: 1000
Returns:

Panda dataframe with the x, y and z values

Mackey-Glass chaotic time series

Mackey, M. C. and Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197(4300):287-289.

dy/dt = -by(t)+ cy(t - tau) / 1+y(t-tau)^10

pyFTS.data.mackey_glass.get_data(b: float = 0.1, c: float = 0.2, tau: float = 17, initial_values: numpy.ndarray = array([0.5, 0.55882353, 0.61764706, 0.67647059, 0.73529412, 0.79411765, 0.85294118, 0.91176471, 0.97058824, 1.02941176, 1.08823529, 1.14705882, 1.20588235, 1.26470588, 1.32352941, 1.38235294, 1.44117647, 1.5 ]), iterations: int = 1000) → list[source]

Return a list with the Mackey-Glass chaotic time series.

Parameters:
  • b – Equation coefficient
  • c – Equation coefficient
  • tau – Lag parameter, default: 17
  • initial_values – numpy array with the initial values of y. Default: np.linspace(0.5,1.5,18)
  • iterations – number of iterations. Default: 1000
Returns:

Rossler chaotic time series

    1. Rössler, Phys. Lett. 57A, 397 (1976).

dx/dt = -z - y dy/dt = x + ay dz/dt = b + z( x - c )

pyFTS.data.rossler.get_data(var: str, a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) → numpy.ndarray[source]

Get a simple univariate time series data.

Parameters:var – the dataset field name to extract
Returns:numpy array
pyFTS.data.rossler.get_dataframe(a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) → pandas.core.frame.DataFrame[source]

Return a dataframe with the multivariate Rössler Map time series (x, y, z).

Parameters:
  • a – Equation coefficient. Default value: 0.2
  • b – Equation coefficient. Default value: 0.2
  • c – Equation coefficient. Default value: 5.7
  • dt – Time differential for continuous time integration. Default value: 0.01
  • initial_values – numpy array with the initial values of x,y and z. Default: [0.001, 0.001, 0.001]
  • iterations – number of iterations. Default: 5000
Returns:

Panda dataframe with the x, y and z values

Sunspots dataset

Monthly sunspot numbers from 1749 to May 2016

Source: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/SUNSPOT/

pyFTS.data.sunspots.get_data() → numpy.ndarray[source]

Get a simple univariate time series data.

Returns:numpy array
pyFTS.data.sunspots.get_dataframe() → pandas.core.frame.DataFrame[source]

Get the complete multivariate time series data.

Returns:Pandas DataFrame