dsipts.data_structure.time_series_d1 module¶
Time Series D1 Layer Module
This module provides the D1 layer for time series data handling: - MultiSourceTSDataSet: Handles raw data from multiple CSV files
Key Features: - Supports multiple CSV files with different groups - Handles regular time intervals - Efficiently processes data in chunks for memory-efficient operation - Handles categorical encoding and normalization - Preserves NaN values for D2 layer to handle
- dsipts.data_structure.time_series_d1.extend_time_df(df, time_col, freq, group_cols=None, max_length=None)[source]¶
Extend a dataframe to ensure regular time intervals.
- Parameters:
df – Input dataframe containing time series data
time_col – Column name containing time information
freq – Frequency to use for extending the dataframe
group_cols – Optional list of columns identifying groups
max_length – Optional maximum length for the extended dataframe
- Returns:
DataFrame extended with regular time intervals with all original columns preserved
- class dsipts.data_structure.time_series_d1.MultiSourceTSDataSet(file_paths, group_cols, time_col, feature_cols, target_cols, static_cols=None, cat_cols=None, num_cols=None, known_cols=None, unknown_cols=None, weights=None, memory_efficient=False, chunk_size=10000)[source]¶
Bases:
DatasetLayer 1 (D1) dataset for multi-source time series data.
This dataset: 1. Loads time series data from multiple CSV files 2. Handles categorical encoding and normalization 3. Efficiently processes data in chunks for memory-efficient operation 4. Preserves NaN values for D2 layer to handle
It does NOT compute validity of windows or create sliding windows - that is the responsibility of the D2 layer (TSDataProcessor).
Initialize the MultiSourceTSDataSet.
- Parameters:
file_paths (List[str]) – List of paths to CSV files containing time series data
group_cols (str | List[str]) – Column(s) that identify unique time series groups
time_col (str) – Column containing time/date information
static_cols (List[str] | None) – Columns with static (non-time-varying) features
cat_cols (List[str] | None) – Categorical columns that need encoding
num_cols (List[str] | None) – Numerical columns (if None, all non-categorical columns are treated as numerical)
known_cols (List[str] | None) – Columns that are known at prediction time (if None, all feature_cols are considered known)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (if None, all target_cols are considered unknown)
weights (str | None) – Name of weights column
memory_efficient (bool) – Whether to use memory-efficient mode
chunk_size (int) – Chunk size for processing data (used in memory-efficient mode)
- __init__(file_paths, group_cols, time_col, feature_cols, target_cols, static_cols=None, cat_cols=None, num_cols=None, known_cols=None, unknown_cols=None, weights=None, memory_efficient=False, chunk_size=10000)[source]¶
Initialize the MultiSourceTSDataSet.
- Parameters:
file_paths (List[str]) – List of paths to CSV files containing time series data
group_cols (str | List[str]) – Column(s) that identify unique time series groups
time_col (str) – Column containing time/date information
static_cols (List[str] | None) – Columns with static (non-time-varying) features
cat_cols (List[str] | None) – Categorical columns that need encoding
num_cols (List[str] | None) – Numerical columns (if None, all non-categorical columns are treated as numerical)
known_cols (List[str] | None) – Columns that are known at prediction time (if None, all feature_cols are considered known)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (if None, all target_cols are considered unknown)
weights (str | None) – Name of weights column
memory_efficient (bool) – Whether to use memory-efficient mode
chunk_size (int) – Chunk size for processing data (used in memory-efficient mode)
- __getitem__(idx)[source]¶
Get data for a specific file-group combination by index.
This method: 1. Maps the index to a specific file-group combination 2. Loads all data for that combination 3. Converts data to appropriate formats for model consumption
- Parameters:
idx – Index of the file-group combination to retrieve
- Returns:
‘x’: Feature tensor
’y’: Target tensor
’t’: Time values (as numpy array)
’w’: Weight tensor
’group_id’: Group identifier
’st’: Static features
- Return type:
Dictionary containing group data with keys