dsipts.data_structure.time_series_d2 module

Time Series D2 Layer Module This module provides the D2 layer for time series data processing: - TSDataModule: LightningDataModule for time series data with support for training, validation, and testing - TimeSeriesSubset: Subset implementation for train/val/test splits - custom_collate_fn: Custom collate function for handling mixed data types Key Features: - Creates sliding windows from time series data - Handles train/validation/test splits (percentage-based or group-based) - Validates data points based on minimum valid requirements - Creates DataLoaders for PyTorch Lightning integration - Efficiently manages memory with caching mechanisms

class dsipts.data_structure.time_series_d2.TSDataModule(d1_dataset, past_len, future_len, batch_size=32, min_valid_length=None, split_method='percentage', split_config=None, num_workers=0, sampler=None, memory_efficient=False, known_cols=None, unknown_cols=None, precompute=True)[source]

Bases: LightningDataModule

D2 Layer - Processes time series data for model consumption.

This module: 1. Creates sliding windows from time series data 2. Handles train/validation/test splits 3. Creates DataLoaders for PyTorch Lightning

Initialize the TSDataModule.

Parameters:
  • d1_dataset (MultiSourceTSDataSet) – The D1 dataset instance (MultiSourceTSDataSet)

  • past_len (int) – Number of past time steps for input

  • future_len (int) – Number of future time steps for prediction

  • batch_size (int) – Batch size for DataLoaders

  • min_valid_length (int | None) – Minimum number of valid points required in a window

  • split_method (str) – Method for splitting data (‘percentage’ or ‘group’)

  • split_config (tuple | None) – Configuration for the split

  • num_workers (int) – Number of workers for DataLoader

  • sampler (Sampler | None) – Optional custom sampler for the DataLoader

  • memory_efficient (bool) – Whether to use memory-efficient mode

  • known_cols (List[str] | None) – Columns that are known at prediction time (overrides D1 dataset settings)

  • unknown_cols (List[str] | None) – Columns that are unknown at prediction time (overrides D1 dataset settings)

  • precompute (bool) – Whether to precompute valid indices and create datasets

__init__(d1_dataset, past_len, future_len, batch_size=32, min_valid_length=None, split_method='percentage', split_config=None, num_workers=0, sampler=None, memory_efficient=False, known_cols=None, unknown_cols=None, precompute=True)[source]

Initialize the TSDataModule.

Parameters:
  • d1_dataset (MultiSourceTSDataSet) – The D1 dataset instance (MultiSourceTSDataSet)

  • past_len (int) – Number of past time steps for input

  • future_len (int) – Number of future time steps for prediction

  • batch_size (int) – Batch size for DataLoaders

  • min_valid_length (int | None) – Minimum number of valid points required in a window

  • split_method (str) – Method for splitting data (‘percentage’ or ‘group’)

  • split_config (tuple | None) – Configuration for the split

  • num_workers (int) – Number of workers for DataLoader

  • sampler (Sampler | None) – Optional custom sampler for the DataLoader

  • memory_efficient (bool) – Whether to use memory-efficient mode

  • known_cols (List[str] | None) – Columns that are known at prediction time (overrides D1 dataset settings)

  • unknown_cols (List[str] | None) – Columns that are unknown at prediction time (overrides D1 dataset settings)

  • precompute (bool) – Whether to precompute valid indices and create datasets

__len__()[source]

Return the number of valid samples in the dataset.

__getitem__(idx)[source]

Get a time series window by global index.

This method: 1. Maps the global index to a specific group and local index 2. Extracts the window from the group data 3. Returns the window in a format suitable for model training

Parameters:

idx – Global index of the window to retrieve

Returns:

  • past_features: Tensor of past features

  • past_time: Array of past time points

  • future_targets: Tensor of future targets

  • future_time: Array of future time points

  • group_id: Group identifier

  • static: Static features tensor

Return type:

Dictionary containing

setup(stage=None)[source]

Prepare data for the given stage.

Parameters:

stage – Either ‘fit’ or ‘test’

train_dataloader()[source]

Return a DataLoader for training.

val_dataloader()[source]

Return a DataLoader for validation.

test_dataloader()[source]

Return a DataLoader for testing.

class dsipts.data_structure.time_series_d2.TimeSeriesSubset(data_module, indices)[source]

Bases: Dataset

Subset of a D2 processor dataset that implements the Dataset interface.

Initialize the TimeSeriesSubset.

Parameters:
  • data_module – The TSDataModule instance (stored as reference, not copy)

  • indices – List of indices to include in this subset

__init__(data_module, indices)[source]

Initialize the TimeSeriesSubset.

Parameters:
  • data_module – The TSDataModule instance (stored as reference, not copy)

  • indices – List of indices to include in this subset

__len__()[source]

Return the number of samples in this subset.

__getitem__(idx)[source]

Get a sample from the data module using the mapped index.

dsipts.data_structure.time_series_d2.custom_collate_fn(batch)[source]

Custom collate function for the DataLoader to handle mixed data types. Handles static features that may be objects or other non-tensor types.