dsipts.data_structure.time_series_d2 module¶

Time Series D2 Layer Module This module provides the D2 layer for time series data processing: - TSDataModule: LightningDataModule for time series data with support for training, validation, and testing - TimeSeriesSubset: Subset implementation for train/val/test splits - custom_collate_fn: Custom collate function for handling mixed data types Key Features: - Creates sliding windows from time series data - Handles train/validation/test splits (percentage-based or group-based) - Validates data points based on minimum valid requirements - Creates DataLoaders for PyTorch Lightning integration - Efficiently manages memory with caching mechanisms

class dsipts.data_structure.time_series_d2.TSDataModule(d1_dataset, past_len, future_len, batch_size=32, min_valid_length=None, split_method='percentage', split_config=None, num_workers=0, sampler=None, memory_efficient=False, known_cols=None, unknown_cols=None, precompute=True)[source]¶

Bases: LightningDataModule

D2 Layer - Processes time series data for model consumption.

This module: 1. Creates sliding windows from time series data 2. Handles train/validation/test splits 3. Creates DataLoaders for PyTorch Lightning

Initialize the TSDataModule.

Parameters:

d1_dataset (MultiSourceTSDataSet) – The D1 dataset instance (MultiSourceTSDataSet)
past_len (int) – Number of past time steps for input
future_len (int) – Number of future time steps for prediction
batch_size (int) – Batch size for DataLoaders
min_valid_length (int | None) – Minimum number of valid points required in a window
split_method (str) – Method for splitting data (‘percentage’ or ‘group’)
split_config (tuple | None) – Configuration for the split
num_workers (int) – Number of workers for DataLoader
sampler (Sampler | None) – Optional custom sampler for the DataLoader
memory_efficient (bool) – Whether to use memory-efficient mode
known_cols (List[str] | None) – Columns that are known at prediction time (overrides D1 dataset settings)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (overrides D1 dataset settings)
precompute (bool) – Whether to precompute valid indices and create datasets

__init__(d1_dataset, past_len, future_len, batch_size=32, min_valid_length=None, split_method='percentage', split_config=None, num_workers=0, sampler=None, memory_efficient=False, known_cols=None, unknown_cols=None, precompute=True)[source]¶

Initialize the TSDataModule.

Parameters:

d1_dataset (MultiSourceTSDataSet) – The D1 dataset instance (MultiSourceTSDataSet)
past_len (int) – Number of past time steps for input
future_len (int) – Number of future time steps for prediction
batch_size (int) – Batch size for DataLoaders
min_valid_length (int | None) – Minimum number of valid points required in a window
split_method (str) – Method for splitting data (‘percentage’ or ‘group’)
split_config (tuple | None) – Configuration for the split
num_workers (int) – Number of workers for DataLoader
sampler (Sampler | None) – Optional custom sampler for the DataLoader
memory_efficient (bool) – Whether to use memory-efficient mode
known_cols (List[str] | None) – Columns that are known at prediction time (overrides D1 dataset settings)
unknown_cols (List[str] | None) – Columns that are unknown at prediction time (overrides D1 dataset settings)
precompute (bool) – Whether to precompute valid indices and create datasets

__len__()[source]¶: Return the number of valid samples in the dataset.

__getitem__(idx)[source]¶

Get a time series window by global index.

This method: 1. Maps the global index to a specific group and local index 2. Extracts the window from the group data 3. Returns the window in a format suitable for model training

Parameters:

idx – Global index of the window to retrieve

Returns:

past_features: Tensor of past features
past_time: Array of past time points
future_targets: Tensor of future targets
future_time: Array of future time points
group_id: Group identifier
static: Static features tensor

Return type:

Dictionary containing

setup(stage=None)[source]¶

Prepare data for the given stage.

Parameters:: stage – Either ‘fit’ or ‘test’

train_dataloader()[source]¶: Return a DataLoader for training.

val_dataloader()[source]¶: Return a DataLoader for validation.

test_dataloader()[source]¶: Return a DataLoader for testing.

class dsipts.data_structure.time_series_d2.TimeSeriesSubset(data_module, indices)[source]¶

Bases: Dataset

Subset of a D2 processor dataset that implements the Dataset interface.

Initialize the TimeSeriesSubset.

Parameters:

data_module – The TSDataModule instance (stored as reference, not copy)
indices – List of indices to include in this subset

__init__(data_module, indices)[source]¶

Initialize the TimeSeriesSubset.

Parameters:

data_module – The TSDataModule instance (stored as reference, not copy)
indices – List of indices to include in this subset

__len__()[source]¶: Return the number of samples in this subset.

__getitem__(idx)[source]¶: Get a sample from the data module using the mapped index.

dsipts.data_structure.time_series_d2.custom_collate_fn(batch)[source]¶: Custom collate function for the DataLoader to handle mixed data types. Handles static features that may be objects or other non-tensor types.