dsipts.data_structure.time_series_d1 module¶

Time Series D1 Layer Module

This module provides the D1 layer for time series data handling: - MultiSourceTSDataSet: Handles raw data from multiple CSV files

Key Features: - Supports multiple CSV files with different groups - Handles regular time intervals - Efficiently processes data in chunks for memory-efficient operation - Handles categorical encoding and normalization - Preserves NaN values for D2 layer to handle

dsipts.data_structure.time_series_d1.extend_time_df(df, time_col, freq, group_cols=None, max_length=None)[source]¶

Extend a dataframe to ensure regular time intervals.

Parameters:

df – Input dataframe containing time series data
time_col – Column name containing time information
freq – Frequency to use for extending the dataframe
group_cols – Optional list of columns identifying groups
max_length – Optional maximum length for the extended dataframe

Returns:

DataFrame extended with regular time intervals with all original columns preserved

class dsipts.data_structure.time_series_d1.MultiSourceTSDataSet(file_paths, group_cols, time_col, feature_cols, target_cols, static_cols=None, cat_cols=None, num_cols=None, known_cols=None, unknown_cols=None, weights=None, memory_efficient=False, chunk_size=10000)[source]¶

Bases: Dataset

Layer 1 (D1) dataset for multi-source time series data.

This dataset: 1. Loads time series data from multiple CSV files 2. Handles categorical encoding and normalization 3. Efficiently processes data in chunks for memory-efficient operation 4. Preserves NaN values for D2 layer to handle

It does NOT compute validity of windows or create sliding windows - that is the responsibility of the D2 layer (TSDataProcessor).

Initialize the MultiSourceTSDataSet.

Parameters:

file_paths (List[str]) – List of paths to CSV files containing time series data
group_cols (str | List[str]) – Column(s) that identify unique time series groups
time_col (str) – Column containing time/date information
feature_cols (List[str]) – Columns to use as features (X)
target_cols (List[str]) – Columns to use as targets (y)
static_cols (List[str] | None) – Columns with static (non-time-varying) features
cat_cols (List[str] | None) – Categorical columns that need encoding
num_cols (List[str] | None) – Numerical columns (if None, all non-categorical columns are treated as numerical)
known_cols (List[str] | None) – Columns that are known at prediction time (if None, all feature_cols are considered known)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (if None, all target_cols are considered unknown)
weights (str | None) – Name of weights column
memory_efficient (bool) – Whether to use memory-efficient mode
chunk_size (int) – Chunk size for processing data (used in memory-efficient mode)

__init__(file_paths, group_cols, time_col, feature_cols, target_cols, static_cols=None, cat_cols=None, num_cols=None, known_cols=None, unknown_cols=None, weights=None, memory_efficient=False, chunk_size=10000)[source]¶

Initialize the MultiSourceTSDataSet.

Parameters:

file_paths (List[str]) – List of paths to CSV files containing time series data
group_cols (str | List[str]) – Column(s) that identify unique time series groups
time_col (str) – Column containing time/date information
feature_cols (List[str]) – Columns to use as features (X)
target_cols (List[str]) – Columns to use as targets (y)
static_cols (List[str] | None) – Columns with static (non-time-varying) features
cat_cols (List[str] | None) – Categorical columns that need encoding
num_cols (List[str] | None) – Numerical columns (if None, all non-categorical columns are treated as numerical)
known_cols (List[str] | None) – Columns that are known at prediction time (if None, all feature_cols are considered known)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (if None, all target_cols are considered unknown)
weights (str | None) – Name of weights column
memory_efficient (bool) – Whether to use memory-efficient mode
chunk_size (int) – Chunk size for processing data (used in memory-efficient mode)

__len__()[source]¶: Return the number of file-group combinations in the dataset.

__getitem__(idx)[source]¶

Get data for a specific file-group combination by index.

This method: 1. Maps the index to a specific file-group combination 2. Loads all data for that combination 3. Converts data to appropriate formats for model consumption

Parameters:

idx – Index of the file-group combination to retrieve

Returns:

‘x’: Feature tensor
’y’: Target tensor
’t’: Time values (as numpy array)
’w’: Weight tensor
’group_id’: Group identifier
’st’: Static features

Return type:

Dictionary containing group data with keys

get_metadata()[source]¶

Return metadata about the dataset.

Returns:: Dictionary containing metadata about columns and their properties
Return type:: Dict