dsipts.data_management package¶
Submodules¶
dsipts.data_management.monash module¶
- class dsipts.data_management.monash.Monash(filename: str, baseUrl: str = 'https://forecastingdata.org/', rebuild: bool = False)¶
Bases:
object
Class for downloading datasets listed here https://forecastingdata.org/
- Parameters:
filename (str) – name of the class, used for saving
baseUrl (str, optional) – url to the source page. Defaults to ‘https://forecastingdata.org/’.
rebuild (bool, optional) – if true the table will be loaded from the webpage otherwise it will be loaded from the saved file. Defaults to False.
- download_dataset(path: str, id: int, rebuild=False) None ¶
download a specific dataset
- Parameters:
path (str) – path in which save the data
id (int) – id of the dataset
rebuild (bool, optional) – if true the dataset will be re-downloaded. Defaults to False.
- generate_dataset(id: int) None | DataFrame ¶
Parse the id-th dataset in a convient format and return a pandas dataset
- Parameters:
id (int) – id of the dataset
- Returns:
dataframe
- Return type:
None or pd.DataFrame
- load(filename: str) None ¶
Load a monarch structure
- Parameters:
filename (str) – filename to load
- save(filename: str) None ¶
Save the monarch structure
- Parameters:
filename (str) – name of the file to generate
- dsipts.data_management.monash.convert_tsf_to_dataframe(full_file_path_and_name: str, replace_missing_vals_with: str = 'NaN', value_column_name: str = 'series_value') DataFrame ¶
I copied this function from the repo
- Parameters:
full_file_path_and_name (str) – path
replace_missing_vals_with (str, optional) – replace not valid numbers. Defaults to “NaN”.
value_column_name (str, optional) – . Defaults to “series_value”.
- Raises:
Exception – see https://forecastingdata.org/ for more information
- Returns:
the selected timseries
- Return type:
pd.DataFrame
- dsipts.data_management.monash.get_freq(freq) str ¶
Get the frequency based on the string reported. I don’t think there are all the possibilities here
- Parameters:
freq (str) – string coming from
- Returns:
pandas frequency format
- Return type:
str
dsipts.data_management.public_datasets module¶
- dsipts.data_management.public_datasets.build_venice(path: str, url='https://www.comune.venezia.it/it/content/archivio-storico-livello-marea-venezia-1') None ¶
- dsipts.data_management.public_datasets.read_public_dataset(path: str, dataset: str) Tuple[DataFrame, List[str]] ¶
Returns the public dataset chosen. Pleas download the dataset from here https://drive.google.com/drive/folders/1ZOYpTUa82_jCcxIdTmyr0LXQfvaM9vIy or ask to agobbi@fbk.eu.
- Parameters:
path (str) – path to data
dataset (str) – dataset (one of ‘electricity’,’etth1’,’etth2’,’ettm1’,’ettm2’,’exchange_rate’,’illness’,’traffic’,’weather’)
- Returns:
The target variable is y and the time index is time and the list of the covariates
- Return type:
Tuple[pd.DataFrame,List[str]]