dsipts.models.Diffusion module

class dsipts.models.Diffusion.Diffusion(d_model, out_channels, past_steps, future_steps, past_channels, future_channels, embs, learn_var, cosine_alpha, diffusion_steps, beta, gamma, n_layers_RNN, d_head, n_head, dropout_rate, activation, subnet, perc_subnet_learning_for_step, persistence_weight=0.0, loss_type='l1', quantiles=[], optim=None, optim_config=None, scheduler_config=None, **kwargs)[source]

Bases: Base

Denoising Diffusion Probabilistic Model

Parameters:
  • d_model (int)

  • out_channels (int) – number of target variables

  • past_steps (int) – size of past window

  • future_steps (int) – size of future window to be predicted

  • past_channels (int) – number of variables available for the past context

  • future_channels (int) – number of variables known in the future, available for forecasting

  • embs (list[int]) – categorical variables dimensions for embeddings

  • learn_var (bool) – Flag to make the model train the posterior variance (if True) or use the variance of posterior distribution

  • cosine_alpha (bool) – Flag for the generation of alphas and betas

  • diffusion_steps (int) – number of noising steps for the initial sample

  • beta (float) – starting variable to generate the diffusion perturbations. Ignored if cosine_alpha == True

  • gamma (float) – trade_off variable to balance loss over noise prediction and NegativeLikelihood/KL_Divergence.

  • n_layers_RNN (int) – param for subnet

  • d_head (int) – param for subnet

  • n_head (int) – param for subnet

  • dropout_rate (float) – param for subnet

  • activation (str) – param for subnet

  • subnet (int) – =1 for attention subnet, =2 for linear subnet. Others can be added(wait for Black Friday for discounts)

  • perc_subnet_learning_for_step (float) – percentage to choose how many subnet has to be trained for every batch. Decrease this value if the loss blows up.

  • persistence_weight (float, optional) – Defaults to 0.0.

  • loss_type (str, optional) – Defaults to ‘l1’.

  • quantiles (List[float], optional) – Only [] accepted. Defaults to [].

  • optim (Union[str,None], optional) – Defaults to None.

  • optim_config (Union[dict,None], optional) – Defaults to None.

  • scheduler_config (Union[dict,None], optional) – Defaults to None.

handle_multivariate = False
handle_future_covariates = True
handle_categorical_variables = True
handle_quantile_loss = False
description = 'Can NOT  handle multivariate output \nCan   handle future covariates\nCan   handle categorical covariates\nCan NOT  handle Quantile loss function'
__init__(d_model, out_channels, past_steps, future_steps, past_channels, future_channels, embs, learn_var, cosine_alpha, diffusion_steps, beta, gamma, n_layers_RNN, d_head, n_head, dropout_rate, activation, subnet, perc_subnet_learning_for_step, persistence_weight=0.0, loss_type='l1', quantiles=[], optim=None, optim_config=None, scheduler_config=None, **kwargs)[source]

Denoising Diffusion Probabilistic Model

Parameters:
  • d_model (int)

  • out_channels (int) – number of target variables

  • past_steps (int) – size of past window

  • future_steps (int) – size of future window to be predicted

  • past_channels (int) – number of variables available for the past context

  • future_channels (int) – number of variables known in the future, available for forecasting

  • embs (list[int]) – categorical variables dimensions for embeddings

  • learn_var (bool) – Flag to make the model train the posterior variance (if True) or use the variance of posterior distribution

  • cosine_alpha (bool) – Flag for the generation of alphas and betas

  • diffusion_steps (int) – number of noising steps for the initial sample

  • beta (float) – starting variable to generate the diffusion perturbations. Ignored if cosine_alpha == True

  • gamma (float) – trade_off variable to balance loss over noise prediction and NegativeLikelihood/KL_Divergence.

  • n_layers_RNN (int) – param for subnet

  • d_head (int) – param for subnet

  • n_head (int) – param for subnet

  • dropout_rate (float) – param for subnet

  • activation (str) – param for subnet

  • subnet (int) – =1 for attention subnet, =2 for linear subnet. Others can be added(wait for Black Friday for discounts)

  • perc_subnet_learning_for_step (float) – percentage to choose how many subnet has to be trained for every batch. Decrease this value if the loss blows up.

  • persistence_weight (float, optional) – Defaults to 0.0.

  • loss_type (str, optional) – Defaults to ‘l1’.

  • quantiles (List[float], optional) – Only [] accepted. Defaults to [].

  • optim (Union[str,None], optional) – Defaults to None.

  • optim_config (Union[dict,None], optional) – Defaults to None.

  • scheduler_config (Union[dict,None], optional) – Defaults to None.

forward(batch)[source]

training process of the diffusion network

Parameters:

batch (dict) – variables loaded

Returns:

total loss about the prediction of the noises over all subnets extracted

Return type:

float

inference(batch)[source]

Inference process to forecast future y

Parameters:

batch (dict) – Keys checked [‘x_num_past, ‘idx_target’, ‘x_num_future’, ‘x_cat_past’, ‘x_cat_future’]

Returns:

generated sequence [batch_size, future_steps, num_var]

Return type:

torch.Tensor

cat_categorical_vars(batch)[source]

Extracting categorical context about past and future

Parameters:

batch (dict) – Keys checked -> [‘x_cat_past’, ‘x_cat_future’]

Returns:

cat_emb_past, cat_emb_fut

Return type:

List[torch.Tensor, torch.Tensor]

remove_var(tensor, indexes_to_exclude, dimension)[source]

Function to remove variables from tensors in chosen dimension and position

Parameters:
  • tensor (torch.Tensor) – starting tensor

  • indexes_to_exclude (list) – index of the chosen dimension we want t oexclude

  • dimension (int) – dimension of the tensor on which we want to work (not list od dims!!)

Returns:

new tensor without the chosen variables

Return type:

torch.Tensor

improving_weight_during_training()[source]

Each time we sample from multinomial we subtract the minimum for more precise sampling, avoiding great learning differences among subnets

This lead to more stable inference also in early training, mainly for common context embedding.

For probabilistic reason, weights has to be >0, so we subtract min-1

q_sample(x_start, t)[source]

Diffuse x_start for t diffusion steps.

In other words, sample from q(x_t | x_0).

Also, compute the mean and variance of the diffusion posterior:

q(x_{t-1} | x_t, x_0)

Posterior mean and variance are the ones to be predicted

Parameters:
  • x_start (torch.Tensor) – values to be predicted

  • t (int) – diffusion step

Returns:

q_sample, posterior mean, posterior log variance and the actual noise

Return type:

List[torch.Tensor, torch.Tensor, torch.Tensor]

normal_kl(mean1, logvar1, mean2, logvar2)[source]

Compute the KL divergence between two gaussians. Also called relative entropy. KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. KL(P||Q) = P*log(P/Q) or -P*log(Q/P)

# In the context of machine learning, KL(P||Q) is often called the ‘information gain’ # achieved if P would be used instead of Q which is currently used.

Shapes are automatically broadcasted, so batches can be compared to scalars, among other use cases.

gaussian_likelihood(x, mean, var)[source]
gaussian_log_likelihood(x, mean, var)[source]
class dsipts.models.Diffusion.SubNet1(aux_past_ch, aux_fut_ch, learn_var, output_channel, d_model, d_head, n_head, activation, dropout_rate)[source]

Bases: Module

-> SUBNET of the DIFFUSION MODEL (DDPM)

It starts with an autoregressive LSTM Network computation of epsilon, then subtracted to ‘y_noised’ tensor. This is always possible! Now we have an approximation of our ‘eps_hat’, that at the end will pass in a residual connection with its embedded version ‘emb_eps_hat’.

‘emb_eps_hat’ will be update with respect to available info about categorical values of our serie: Through an ATTENTION Network we compare past categorical with future categorical to update the embedded noise predicted.

Also, if we have values about auxiliary numerical variables both in past and future, the changes of these variables will be fetched by another ATTENTION Network.

The goal is ensure valuable computations for ‘eps’ always, and then updating things if we have enough data. Both attentions uses { Q = *_future, K = *_past, V = y_past } using as much as possible context variables for better updates.

Parameters:
  • learn_var (bool) – set if the network has to learn the optim variance of each step

  • output_channel (int) – number of variables to be predicted

  • future_steps (int) – number of step in the future, so the number of timesstep to be predicted

  • d_model (int) – hidden dimension of the model

  • num_layers_RNN (int) – number of layers for autoregressive prediction

  • d_head (int) – number of heads for Attention Networks

  • n_head (int) – hidden dimension of heads for Attention Networks

  • dropout_rate (float)

__init__(aux_past_ch, aux_fut_ch, learn_var, output_channel, d_model, d_head, n_head, activation, dropout_rate)[source]

-> SUBNET of the DIFFUSION MODEL (DDPM)

It starts with an autoregressive LSTM Network computation of epsilon, then subtracted to ‘y_noised’ tensor. This is always possible! Now we have an approximation of our ‘eps_hat’, that at the end will pass in a residual connection with its embedded version ‘emb_eps_hat’.

‘emb_eps_hat’ will be update with respect to available info about categorical values of our serie: Through an ATTENTION Network we compare past categorical with future categorical to update the embedded noise predicted.

Also, if we have values about auxiliary numerical variables both in past and future, the changes of these variables will be fetched by another ATTENTION Network.

The goal is ensure valuable computations for ‘eps’ always, and then updating things if we have enough data. Both attentions uses { Q = *_future, K = *_past, V = y_past } using as much as possible context variables for better updates.

Parameters:
  • learn_var (bool) – set if the network has to learn the optim variance of each step

  • output_channel (int) – number of variables to be predicted

  • future_steps (int) – number of step in the future, so the number of timesstep to be predicted

  • d_model (int) – hidden dimension of the model

  • num_layers_RNN (int) – number of layers for autoregressive prediction

  • d_head (int) – number of heads for Attention Networks

  • n_head (int) – hidden dimension of heads for Attention Networks

  • dropout_rate (float)

forward(y_noised, y_past, cat_past, cat_fut, num_past=None, num_fut=None)[source]

‘DIFFUSION SUBNET :param y_noised: [B, future_step, num_var] :type y_noised: torch.Tensor :param y_past: [B, past_step, num_var] :type y_past: torch.Tensor :param cat_past: [B, past_step, d_model]. Defaults to None. :type cat_past: torch.Tensor, optional :param cat_fut: [B, future_step, d_model]. Defaults to None. :type cat_fut: torch.Tensor, optional :param num_past: [B, past_step, d_model]. Defaults to None. :type num_past: torch.Tensor, optional :param num_fut: [B, future_step, d_model]. Defaults to None. :type num_fut: torch.Tensor, optional

Returns:

predicted noise [B, future_step, num_var]. According to ‘learn_var’ param in initialization, the subnet returns another tensor of same size about the variance

Return type:

torch.Tensor

class dsipts.models.Diffusion.SubNet2(aux_past_ch, aux_fut_ch, learn_var, past_steps, future_steps, output_channel, d_model, activation, dropout_rate)[source]

Bases: Module

Initialize internal Module state, shared by both nn.Module and ScriptModule.

__init__(aux_past_ch, aux_fut_ch, learn_var, past_steps, future_steps, output_channel, d_model, activation, dropout_rate)[source]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(y_noised, y_past, cat_past, cat_fut, num_past=None, num_fut=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class dsipts.models.Diffusion.SubNet3(learn_var, flag_aux_num, num_var, d_model, pred_step, num_layers, d_head, n_head, dropout)[source]

Bases: Module

Initialize internal Module state, shared by both nn.Module and ScriptModule.

__init__(learn_var, flag_aux_num, num_var, d_model, pred_step, num_layers, d_head, n_head, dropout)[source]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(y_noised, y_past, cat_past, cat_fut, num_past=None, num_fut=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.