dsipts.data_management package

Submodules

dsipts.data_management.monash module

class dsipts.data_management.monash.Monash(filename: str, baseUrl: str = 'https://forecastingdata.org/', rebuild: bool = False)

Bases: object

Class for downloading datasets listed here https://forecastingdata.org/

Parameters:
  • filename (str) – name of the class, used for saving

  • baseUrl (str, optional) – url to the source page. Defaults to ‘https://forecastingdata.org/’.

  • rebuild (bool, optional) – if true the table will be loaded from the webpage otherwise it will be loaded from the saved file. Defaults to False.

download_dataset(path: str, id: int, rebuild=False) None

download a specific dataset

Parameters:
  • path (str) – path in which save the data

  • id (int) – id of the dataset

  • rebuild (bool, optional) – if true the dataset will be re-downloaded. Defaults to False.

generate_dataset(id: int) None | DataFrame

Parse the id-th dataset in a convient format and return a pandas dataset

Parameters:

id (int) – id of the dataset

Returns:

dataframe

Return type:

None or pd.DataFrame

load(filename: str) None

Load a monarch structure

Parameters:

filename (str) – filename to load

save(filename: str) None

Save the monarch structure

Parameters:

filename (str) – name of the file to generate

dsipts.data_management.monash.convert_tsf_to_dataframe(full_file_path_and_name: str, replace_missing_vals_with: str = 'NaN', value_column_name: str = 'series_value') DataFrame

I copied this function from the repo

Parameters:
  • full_file_path_and_name (str) – path

  • replace_missing_vals_with (str, optional) – replace not valid numbers. Defaults to “NaN”.

  • value_column_name (str, optional) – . Defaults to “series_value”.

Raises:

Exception – see https://forecastingdata.org/ for more information

Returns:

the selected timseries

Return type:

pd.DataFrame

dsipts.data_management.monash.get_freq(freq) str

Get the frequency based on the string reported. I don’t think there are all the possibilities here

Parameters:

freq (str) – string coming from

Returns:

pandas frequency format

Return type:

str

dsipts.data_management.public_datasets module

dsipts.data_management.public_datasets.build_venice(path: str, url='https://www.comune.venezia.it/it/content/archivio-storico-livello-marea-venezia-1') None
dsipts.data_management.public_datasets.read_public_dataset(path: str, dataset: str) Tuple[DataFrame, List[str]]

Returns the public dataset chosen. Pleas download the dataset from here https://drive.google.com/drive/folders/1ZOYpTUa82_jCcxIdTmyr0LXQfvaM9vIy or ask to agobbi@fbk.eu.

Parameters:
  • path (str) – path to data

  • dataset (str) – dataset (one of ‘electricity’,’etth1’,’etth2’,’ettm1’,’ettm2’,’exchange_rate’,’illness’,’traffic’,’weather’)

Returns:

The target variable is y and the time index is time and the list of the covariates

Return type:

Tuple[pd.DataFrame,List[str]]

Module contents