dsipts.data_structure.time_series_d2 module¶
Time Series D2 Layer Module This module provides the D2 layer for time series data processing: - TSDataModule: LightningDataModule for time series data with support for training, validation, and testing - TimeSeriesSubset: Subset implementation for train/val/test splits - custom_collate_fn: Custom collate function for handling mixed data types Key Features: - Creates sliding windows from time series data - Handles train/validation/test splits (percentage-based or group-based) - Validates data points based on minimum valid requirements - Creates DataLoaders for PyTorch Lightning integration - Efficiently manages memory with caching mechanisms
- class dsipts.data_structure.time_series_d2.TSDataModule(d1_dataset, past_len, future_len, batch_size=32, min_valid_length=None, split_method='percentage', split_config=None, num_workers=0, sampler=None, memory_efficient=False, known_cols=None, unknown_cols=None, precompute=True)[source]¶
Bases:
LightningDataModuleD2 Layer - Processes time series data for model consumption.
This module: 1. Creates sliding windows from time series data 2. Handles train/validation/test splits 3. Creates DataLoaders for PyTorch Lightning
Initialize the TSDataModule.
- Parameters:
d1_dataset (MultiSourceTSDataSet) – The D1 dataset instance (MultiSourceTSDataSet)
past_len (int) – Number of past time steps for input
future_len (int) – Number of future time steps for prediction
batch_size (int) – Batch size for DataLoaders
min_valid_length (int | None) – Minimum number of valid points required in a window
split_method (str) – Method for splitting data (‘percentage’ or ‘group’)
split_config (tuple | None) – Configuration for the split
num_workers (int) – Number of workers for DataLoader
sampler (Sampler | None) – Optional custom sampler for the DataLoader
memory_efficient (bool) – Whether to use memory-efficient mode
known_cols (List[str] | None) – Columns that are known at prediction time (overrides D1 dataset settings)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (overrides D1 dataset settings)
precompute (bool) – Whether to precompute valid indices and create datasets
- __init__(d1_dataset, past_len, future_len, batch_size=32, min_valid_length=None, split_method='percentage', split_config=None, num_workers=0, sampler=None, memory_efficient=False, known_cols=None, unknown_cols=None, precompute=True)[source]¶
Initialize the TSDataModule.
- Parameters:
d1_dataset (MultiSourceTSDataSet) – The D1 dataset instance (MultiSourceTSDataSet)
past_len (int) – Number of past time steps for input
future_len (int) – Number of future time steps for prediction
batch_size (int) – Batch size for DataLoaders
min_valid_length (int | None) – Minimum number of valid points required in a window
split_method (str) – Method for splitting data (‘percentage’ or ‘group’)
split_config (tuple | None) – Configuration for the split
num_workers (int) – Number of workers for DataLoader
sampler (Sampler | None) – Optional custom sampler for the DataLoader
memory_efficient (bool) – Whether to use memory-efficient mode
known_cols (List[str] | None) – Columns that are known at prediction time (overrides D1 dataset settings)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (overrides D1 dataset settings)
precompute (bool) – Whether to precompute valid indices and create datasets
- __getitem__(idx)[source]¶
Get a time series window by global index.
This method: 1. Maps the global index to a specific group and local index 2. Extracts the window from the group data 3. Returns the window in a format suitable for model training
- Parameters:
idx – Global index of the window to retrieve
- Returns:
past_features: Tensor of past features
past_time: Array of past time points
future_targets: Tensor of future targets
future_time: Array of future time points
group_id: Group identifier
static: Static features tensor
- Return type:
Dictionary containing
- class dsipts.data_structure.time_series_d2.TimeSeriesSubset(data_module, indices)[source]¶
Bases:
DatasetSubset of a D2 processor dataset that implements the Dataset interface.
Initialize the TimeSeriesSubset.
- Parameters:
data_module – The TSDataModule instance (stored as reference, not copy)
indices – List of indices to include in this subset