pyFTS.data package¶

Module contents¶

Module for pyFTS standard datasets facilities

Submodules¶

pyFTS.data.common module¶

pyFTS.data.common.get_dataframe(filename: str, url: str, sep: str = ';', compression: str = 'infer') → pandas.core.frame.DataFrame[source]¶

This method check if filename already exists, read the file and return its data. If the file don’t already exists, it will be downloaded and decompressed.

Parameters:	filename – dataset local filename url – dataset internet URL sep – CSV field separator compression – type of compression
Returns:	Pandas dataset

Datasets¶

Artificial and synthetic data generators¶

Facilities to generate synthetic stochastic processes

class pyFTS.data.artificial.SignalEmulator(**kwargs)[source]¶

Bases: object

Emulate a complex signal built from several additive and non-additive components

blip(**kwargs)[source]¶

Creates an outlier greater than the maximum or lower then the minimum previous values of the signal, and insert it on a random location of the signal.

Returns:	the current SignalEmulator instance, for method chaining

components = None¶: Components of the signal

incremental_gaussian(mu: float, sigma: float, **kwargs)[source]¶

Creates an additive gaussian interference on a previous signal

Parameters:

mu – increment on mean
sigma – increment on variance
start – lag index to start this signal, the default value is 0
it – Number of iterations, the default value is 1
length – Number of samples generated on each iteration, the default value is 100
vmin – Lower bound value of generated data, the default value is None
vmax – Upper bound value of generated data, the default value is None

Returns:

the current SignalEmulator instance, for method chaining

periodic_gaussian(type: str, period: int, mu_min: float, sigma_min: float, mu_max: float, sigma_max: float, **kwargs)[source]¶

Creates an additive periodic gaussian interference on a previous signal

Parameters:

type – ‘linear’ or ‘sinoidal’
period – the period of recurrence
mu – increment on mean
sigma – increment on variance
start – lag index to start this signal, the default value is 0
it – Number of iterations, the default value is 1
length – Number of samples generated on each iteration, the default value is 100
vmin – Lower bound value of generated data, the default value is None
vmax – Upper bound value of generated data, the default value is None

Returns:

the current SignalEmulator instance, for method chaining

run()[source]¶

Render the signal

Returns:	a list of float values

stationary_gaussian(mu: float, sigma: float, **kwargs)[source]¶

Creates a continuous Gaussian signal with mean mu and variance sigma.

Parameters:

mu – mean
sigma – variance
additive – If False it cancels the previous signal and start this one, if True this signal is added to the previous one
start – lag index to start this signal, the default value is 0
it – Number of iterations, the default value is 1
length – Number of samples generated on each iteration, the default value is 100
vmin – Lower bound value of generated data, the default value is None
vmax – Upper bound value of generated data, the default value is None

Returns:

the current SignalEmulator instance, for method chaining

pyFTS.data.artificial.generate_gaussian_linear(mu_ini, sigma_ini, mu_inc, sigma_inc, it=100, num=10, vmin=None, vmax=None)[source]¶

Generate data sampled from Gaussian distribution, with constant or linear changing parameters

Parameters:	mu_ini – Initial mean sigma_ini – Initial variance mu_inc – Mean increment after ‘num’ samples sigma_inc – Variance increment after ‘num’ samples it – Number of iterations num – Number of samples generated on each iteration vmin – Lower bound value of generated data vmax – Upper bound value of generated data
Returns:	A list of it*num float values

pyFTS.data.artificial.generate_linear_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]¶

Generates a periodic linear variation on mean and variance

Parameters:

period – the period of recurrence
mu_min – initial (and minimum) mean of each period
sigma_min – initial (and minimum) variance of each period
mu_max – final (and maximum) mean of each period
sigma_max – final (and maximum) variance of each period
it – Number of iterations
num – Number of samples generated on each iteration
vmin – Lower bound value of generated data
vmax – Upper bound value of generated data

Returns:

A list of it*num float values

pyFTS.data.artificial.generate_sinoidal_periodic_gaussian(period, mu_min, sigma_min, mu_max, sigma_max, it=100, num=10, vmin=None, vmax=None)[source]¶

Generates a periodic sinoidal variation on mean and variance

Parameters:

period – the period of recurrence
mu_min – initial (and minimum) mean of each period
sigma_min – initial (and minimum) variance of each period
mu_max – final (and maximum) mean of each period
sigma_max – final (and maximum) variance of each period
it – Number of iterations
num – Number of samples generated on each iteration
vmin – Lower bound value of generated data
vmax – Upper bound value of generated data

Returns:

A list of it*num float values

pyFTS.data.artificial.generate_uniform_linear(min_ini, max_ini, min_inc, max_inc, it=100, num=10, vmin=None, vmax=None)[source]¶

Generate data sampled from Uniform distribution, with constant or linear changing bounds

Parameters:	mu_ini – Initial mean sigma_ini – Initial variance mu_inc – Mean increment after ‘num’ samples sigma_inc – Variance increment after ‘num’ samples it – Number of iterations num – Number of samples generated on each iteration vmin – Lower bound value of generated data vmax – Upper bound value of generated data
Returns:	A list of it*num float values

pyFTS.data.artificial.random_walk(n=500, type='gaussian')[source]¶

Simple random walk

Parameters:	n – number of samples type – ‘gaussian’ or ‘uniform’
Returns:

pyFTS.data.artificial.white_noise(n=500)[source]¶: Simple Gaussian noise signal :param n: number of samples :return:

AirPassengers dataset¶

Monthly totals of a airline passengers from USA, from January 1949 through December 1960.

Source: Hyndman, R.J., Time Series Data Library, http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/.

pyFTS.data.AirPassengers.get_data() → numpy.ndarray[source]¶

Get a simple univariate time series data.

Returns:	numpy array

pyFTS.data.AirPassengers.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

Bitcoin dataset¶

Bitcoin to USD quotations

Daily averaged index, by business day, from 2010 to 2018.

Source: https://finance.yahoo.com/quote/BTC-USD?p=BTC-USD

pyFTS.data.Bitcoin.get_data(field: str = 'AVG') → numpy.ndarray[source]¶

Get the univariate time series data.

Parameters:	field – dataset field to load
Returns:	numpy array

pyFTS.data.Bitcoin.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

DowJones dataset¶

DJI - Dow Jones

Daily averaged index, by business day, from 1985 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.DowJones.get_data(field: str = 'AVG') → numpy.ndarray[source]¶

Get the univariate time series data.

Parameters:	field – dataset field to load
Returns:	numpy array

pyFTS.data.DowJones.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

Enrollments dataset¶

Yearly University of Alabama enrollments from 1971 to 1992.

pyFTS.data.Enrollments.get_data() → numpy.ndarray[source]¶

Get a simple univariate time series data.

Returns:	numpy array

pyFTS.data.Enrollments.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Ethereum dataset¶

Ethereum to USD quotations

Daily averaged index, by business day, from 2016 to 2018.

Source: https://finance.yahoo.com/quote/ETH-USD?p=ETH-USD

pyFTS.data.Ethereum.get_data(field: str = 'AVG') → numpy.ndarray[source]¶

Get the univariate time series data.

Parameters:	field – dataset field to load
Returns:	numpy array

pyFTS.data.Ethereum.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

EUR-GBP dataset¶

FOREX market EUR-GBP pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURGBP.get_data(field: str = 'avg') → numpy.ndarray[source]¶

Get the univariate time series data.

Parameters:	field – dataset field to load
Returns:	numpy array

pyFTS.data.EURGBP.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

EUR-USD dataset¶

FOREX market EUR-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.EURUSD.get_data(field: str = 'avg') → numpy.ndarray[source]¶

Get the univariate time series data.

Parameters:	field – dataset field to load
Returns:	numpy array

pyFTS.data.EURUSD.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

GBP-USD dataset¶

FOREX market GBP-USD pair.

Daily averaged quotations, by business day, from 2016 to 2018.

pyFTS.data.GBPUSD.get_data(field: str = 'avg') → numpy.ndarray[source]¶

Get the univariate time series data.

Parameters:	field – dataset field to load
Returns:	numpy array

pyFTS.data.GBPUSD.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

INMET dataset¶

INMET - Instituto Nacional Meteorologia / Brasil

Belo Horizonte station, from 2000-01-01 to 31/12/2012

Source: http://www.inmet.gov.br

pyFTS.data.INMET.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

Malaysia dataset¶

Hourly Malaysia eletric load and tempeature

pyFTS.data.Malaysia.get_data(field: str = 'load') → numpy.ndarray[source]¶

Get the univariate time series data.

Parameters:	field – dataset field to load
Returns:	numpy array

pyFTS.data.Malaysia.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

NASDAQ module¶

National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ IXIC)

Daily averaged index by business day, from 2000 to 2016.

Source: http://www.nasdaq.com/aspx/flashquotes.aspx?symbol=IXIC&selected=IXIC

pyFTS.data.NASDAQ.get_data(field: str = 'avg') → numpy.ndarray[source]¶

Get a simple univariate time series data.

Parameters:	field – the dataset field name to extract
Returns:	numpy array

pyFTS.data.NASDAQ.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

SONDA dataset¶

SONDA - Sistema de Organização Nacional de Dados Ambientais, from INPE - Instituto Nacional de Pesquisas Espaciais, Brasil.

Brasilia station

Source: http://sonda.ccst.inpe.br/

pyFTS.data.SONDA.get_data(field: str) → numpy.ndarray[source]¶

Get a simple univariate time series data.

Parameters:	field – the dataset field name to extract
Returns:	numpy array

pyFTS.data.SONDA.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

S&P 500 dataset¶

S&P500 - Standard & Poor’s 500

Daily averaged index, by business day, from 1950 to 2017.

Source: https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC

pyFTS.data.SP500.get_data() → numpy.ndarray[source]¶

Get the univariate time series data.

Returns:	numpy array

pyFTS.data.SP500.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

TAIEX dataset¶

The Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)

Daily averaged index by business day, from 1995 to 2014.

Source: http://www.twse.com.tw/en/products/indices/Index_Series.php

pyFTS.data.TAIEX.get_data() → numpy.ndarray[source]¶

Get the univariate time series data.

Returns:	numpy array

pyFTS.data.TAIEX.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

Henon chaotic time series¶

Hénon. “A two-dimensional mapping with a strange attractor”. Commun. Math. Phys. 50, 69-77 (1976)

dx/dt = a + by(t-1) - x(t-1)^2 dy/dt = x

pyFTS.data.henon.get_data(var: str, a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) → pandas.core.frame.DataFrame[source]¶

Get a simple univariate time series data.

Parameters:	var – the dataset field name to extract
Returns:	numpy array

pyFTS.data.henon.get_dataframe(a: float = 1.4, b: float = 0.3, initial_values: list = [1, 1], iterations: int = 1000) → pandas.core.frame.DataFrame[source]¶

Return a dataframe with the bivariate Henon Map time series (x, y).

Parameters:	a – Equation coefficient b – Equation coefficient initial_values – numpy array with the initial values of x and y. Default: [1, 1] iterations – number of iterations. Default: 1000
Returns:	Panda dataframe with the x and y values

Logistic_map chaotic time series¶

May, Robert M. (1976). “Simple mathematical models with very complicated dynamics”. Nature. 261 (5560): 459–467. doi:10.1038/261459a0.

x(t) = r * x(t-1) * (1 - x(t -1) )

pyFTS.data.logistic_map.get_data(r: float = 4, initial_value: float = 0.3, iterations: int = 100) → list[source]¶

Return a list with the logistic map chaotic time series.

Parameters:	r – Equation coefficient initial_value – Initial value of x. Default: 0.3 iterations – number of iterations. Default: 100
Returns:

Lorentz chaotic time series¶

Lorenz, Edward Norton (1963). “Deterministic nonperiodic flow”. Journal of the Atmospheric Sciences. 20 (2): 130–141. https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2

dx/dt = a(y -x) dy/dt = x(b - z) - y dz/dt = xy - cz

pyFTS.data.lorentz.get_data(var: str, a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) → pandas.core.frame.DataFrame[source]¶

Get a simple univariate time series data.

Parameters:	var – the dataset field name to extract
Returns:	numpy array

pyFTS.data.lorentz.get_dataframe(a: float = 10.0, b: float = 28.0, c: float = 2.6666666666666665, dt: float = 0.01, initial_values: list = [0.1, 0, 0], iterations: int = 1000) → pandas.core.frame.DataFrame[source]¶

Return a dataframe with the multivariate Lorenz Map time series (x, y, z).

Parameters:	a – Equation coefficient. Default value: 10 b – Equation coefficient. Default value: 28 c – Equation coefficient. Default value: 8.0/3.0 dt – Time differential for continuous time integration. Default value: 0.01 initial_values – numpy array with the initial values of x,y and z. Default: [0.1, 0, 0] iterations – number of iterations. Default: 1000
Returns:	Panda dataframe with the x, y and z values

Mackey-Glass chaotic time series¶

Mackey, M. C. and Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197(4300):287-289.

dy/dt = -by(t)+ cy(t - tau) / 1+y(t-tau)^10

pyFTS.data.mackey_glass.get_data(b: float = 0.1, c: float = 0.2, tau: float = 17, initial_values: numpy.ndarray = array([0.5, 0.55882353, 0.61764706, 0.67647059, 0.73529412, 0.79411765, 0.85294118, 0.91176471, 0.97058824, 1.02941176, 1.08823529, 1.14705882, 1.20588235, 1.26470588, 1.32352941, 1.38235294, 1.44117647, 1.5 ]), iterations: int = 1000) → list[source]¶

Return a list with the Mackey-Glass chaotic time series.

Parameters:	b – Equation coefficient c – Equation coefficient tau – Lag parameter, default: 17 initial_values – numpy array with the initial values of y. Default: np.linspace(0.5,1.5,18) iterations – number of iterations. Default: 1000
Returns:

Rossler chaotic time series¶

1. Rössler, Phys. Lett. 57A, 397 (1976).

dx/dt = -z - y dy/dt = x + ay dz/dt = b + z( x - c )

pyFTS.data.rossler.get_data(var: str, a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) → numpy.ndarray[source]¶

Get a simple univariate time series data.

Parameters:	var – the dataset field name to extract
Returns:	numpy array

pyFTS.data.rossler.get_dataframe(a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01, initial_values: numpy.ndarray = [0.001, 0.001, 0.001], iterations: int = 5000) → pandas.core.frame.DataFrame[source]¶

Return a dataframe with the multivariate Rössler Map time series (x, y, z).

Parameters:	a – Equation coefficient. Default value: 0.2 b – Equation coefficient. Default value: 0.2 c – Equation coefficient. Default value: 5.7 dt – Time differential for continuous time integration. Default value: 0.01 initial_values – numpy array with the initial values of x,y and z. Default: [0.001, 0.001, 0.001] iterations – number of iterations. Default: 5000
Returns:	Panda dataframe with the x, y and z values

Sunspots dataset¶

Monthly sunspot numbers from 1749 to May 2016

Source: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/SUNSPOT/

pyFTS.data.sunspots.get_data() → numpy.ndarray[source]¶

Get a simple univariate time series data.

Returns:	numpy array

pyFTS.data.sunspots.get_dataframe() → pandas.core.frame.DataFrame[source]¶

Get the complete multivariate time series data.

Returns:	Pandas DataFrame

Table of Contents

Previous topic

Next topic

This Page

pyFTS.data package¶

Module contents¶

Submodules¶

pyFTS.data.common module¶

Datasets¶

Artificial and synthetic data generators¶

AirPassengers dataset¶

Bitcoin dataset¶

DowJones dataset¶

Enrollments dataset¶

Ethereum dataset¶

EUR-GBP dataset¶

EUR-USD dataset¶

GBP-USD dataset¶

INMET dataset¶

Malaysia dataset¶

NASDAQ module¶

SONDA dataset¶

S&P 500 dataset¶

TAIEX dataset¶

Henon chaotic time series¶

Logistic_map chaotic time series¶

Lorentz chaotic time series¶

Mackey-Glass chaotic time series¶

Rossler chaotic time series¶

Sunspots dataset¶