dsipts.models.Diffusion module¶

class dsipts.models.Diffusion.Diffusion(d_model, out_channels, past_steps, future_steps, past_channels, future_channels, embs, learn_var, cosine_alpha, diffusion_steps, beta, gamma, n_layers_RNN, d_head, n_head, dropout_rate, activation, subnet, perc_subnet_learning_for_step, persistence_weight=0.0, loss_type='l1', quantiles=[], optim=None, optim_config=None, scheduler_config=None, **kwargs)[source]¶

Bases: Base

Denoising Diffusion Probabilistic Model

Parameters:

d_model (int)
out_channels (int) – number of target variables
past_steps (int) – size of past window
future_steps (int) – size of future window to be predicted
past_channels (int) – number of variables available for the past context
future_channels (int) – number of variables known in the future, available for forecasting
embs (list[int]) – categorical variables dimensions for embeddings
learn_var (bool) – Flag to make the model train the posterior variance (if True) or use the variance of posterior distribution
cosine_alpha (bool) – Flag for the generation of alphas and betas
diffusion_steps (int) – number of noising steps for the initial sample
beta (float) – starting variable to generate the diffusion perturbations. Ignored if cosine_alpha == True
gamma (float) – trade_off variable to balance loss over noise prediction and NegativeLikelihood/KL_Divergence.
n_layers_RNN (int) – param for subnet
d_head (int) – param for subnet
n_head (int) – param for subnet
dropout_rate (float) – param for subnet
activation (str) – param for subnet
subnet (int) – =1 for attention subnet, =2 for linear subnet. Others can be added(wait for Black Friday for discounts)
perc_subnet_learning_for_step (float) – percentage to choose how many subnet has to be trained for every batch. Decrease this value if the loss blows up.
persistence_weight (float, optional) – Defaults to 0.0.
loss_type (str, optional) – Defaults to ‘l1’.
quantiles (List[float], optional) – Only [] accepted. Defaults to [].
optim (Union[str,None], optional) – Defaults to None.
optim_config (Union[dict,None], optional) – Defaults to None.
scheduler_config (Union[dict,None], optional) – Defaults to None.

handle_multivariate = False¶

handle_future_covariates = True¶

handle_categorical_variables = True¶

handle_quantile_loss = False¶

description = 'Can NOT handle multivariate output \nCan handle future covariates\nCan handle categorical covariates\nCan NOT handle Quantile loss function'¶

__init__(d_model, out_channels, past_steps, future_steps, past_channels, future_channels, embs, learn_var, cosine_alpha, diffusion_steps, beta, gamma, n_layers_RNN, d_head, n_head, dropout_rate, activation, subnet, perc_subnet_learning_for_step, persistence_weight=0.0, loss_type='l1', quantiles=[], optim=None, optim_config=None, scheduler_config=None, **kwargs)[source]¶

Denoising Diffusion Probabilistic Model

Parameters:

d_model (int)
out_channels (int) – number of target variables
past_steps (int) – size of past window
future_steps (int) – size of future window to be predicted
past_channels (int) – number of variables available for the past context
future_channels (int) – number of variables known in the future, available for forecasting
embs (list[int]) – categorical variables dimensions for embeddings
learn_var (bool) – Flag to make the model train the posterior variance (if True) or use the variance of posterior distribution
cosine_alpha (bool) – Flag for the generation of alphas and betas
diffusion_steps (int) – number of noising steps for the initial sample
beta (float) – starting variable to generate the diffusion perturbations. Ignored if cosine_alpha == True
gamma (float) – trade_off variable to balance loss over noise prediction and NegativeLikelihood/KL_Divergence.
n_layers_RNN (int) – param for subnet
d_head (int) – param for subnet
n_head (int) – param for subnet
dropout_rate (float) – param for subnet
activation (str) – param for subnet
subnet (int) – =1 for attention subnet, =2 for linear subnet. Others can be added(wait for Black Friday for discounts)
perc_subnet_learning_for_step (float) – percentage to choose how many subnet has to be trained for every batch. Decrease this value if the loss blows up.
persistence_weight (float, optional) – Defaults to 0.0.
loss_type (str, optional) – Defaults to ‘l1’.
quantiles (List[float], optional) – Only [] accepted. Defaults to [].
optim (Union[str,None], optional) – Defaults to None.
optim_config (Union[dict,None], optional) – Defaults to None.
scheduler_config (Union[dict,None], optional) – Defaults to None.

forward(batch)[source]¶

training process of the diffusion network

Parameters:: batch (dict) – variables loaded
Returns:: total loss about the prediction of the noises over all subnets extracted
Return type:: float

inference(batch)[source]¶

Inference process to forecast future y

Parameters:: batch (dict) – Keys checked [‘x_num_past, ‘idx_target’, ‘x_num_future’, ‘x_cat_past’, ‘x_cat_future’]
Returns:: generated sequence [batch_size, future_steps, num_var]
Return type:: torch.Tensor

cat_categorical_vars(batch)[source]¶

Extracting categorical context about past and future

Parameters:: batch (dict) – Keys checked -> [‘x_cat_past’, ‘x_cat_future’]
Returns:: cat_emb_past, cat_emb_fut
Return type:: List[torch.Tensor, torch.Tensor]

remove_var(tensor, indexes_to_exclude, dimension)[source]¶

Function to remove variables from tensors in chosen dimension and position

Parameters:

tensor (torch.Tensor) – starting tensor
indexes_to_exclude (list) – index of the chosen dimension we want t oexclude
dimension (int) – dimension of the tensor on which we want to work (not list od dims!!)

Returns:

new tensor without the chosen variables

Return type:

torch.Tensor

improving_weight_during_training()[source]¶

Each time we sample from multinomial we subtract the minimum for more precise sampling, avoiding great learning differences among subnets

This lead to more stable inference also in early training, mainly for common context embedding.

For probabilistic reason, weights has to be >0, so we subtract min-1

q_sample(x_start, t)[source]¶

Diffuse x_start for t diffusion steps.

In other words, sample from q(x_t | x_0).

Also, compute the mean and variance of the diffusion posterior:

q(x_{t-1} | x_t, x_0)

Posterior mean and variance are the ones to be predicted

Parameters:

x_start (torch.Tensor) – values to be predicted
t (int) – diffusion step

Returns:

q_sample, posterior mean, posterior log variance and the actual noise

Return type:

List[torch.Tensor, torch.Tensor, torch.Tensor]

normal_kl(mean1, logvar1, mean2, logvar2)[source]¶

Compute the KL divergence between two gaussians. Also called relative entropy. KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. KL(P||Q) = P*log(P/Q) or -P*log(Q/P)

# In the context of machine learning, KL(P||Q) is often called the ‘information gain’ # achieved if P would be used instead of Q which is currently used.

Shapes are automatically broadcasted, so batches can be compared to scalars, among other use cases.

gaussian_likelihood(x, mean, var)[source]¶

gaussian_log_likelihood(x, mean, var)[source]¶

class dsipts.models.Diffusion.SubNet1(aux_past_ch, aux_fut_ch, learn_var, output_channel, d_model, d_head, n_head, activation, dropout_rate)[source]¶

Bases: Module

-> SUBNET of the DIFFUSION MODEL (DDPM)

It starts with an autoregressive LSTM Network computation of epsilon, then subtracted to ‘y_noised’ tensor. This is always possible! Now we have an approximation of our ‘eps_hat’, that at the end will pass in a residual connection with its embedded version ‘emb_eps_hat’.

‘emb_eps_hat’ will be update with respect to available info about categorical values of our serie: Through an ATTENTION Network we compare past categorical with future categorical to update the embedded noise predicted.

Also, if we have values about auxiliary numerical variables both in past and future, the changes of these variables will be fetched by another ATTENTION Network.

The goal is ensure valuable computations for ‘eps’ always, and then updating things if we have enough data. Both attentions uses { Q = *_future, K = *_past, V = y_past } using as much as possible context variables for better updates.

Parameters:

learn_var (bool) – set if the network has to learn the optim variance of each step
output_channel (int) – number of variables to be predicted
future_steps (int) – number of step in the future, so the number of timesstep to be predicted
d_model (int) – hidden dimension of the model
num_layers_RNN (int) – number of layers for autoregressive prediction
d_head (int) – number of heads for Attention Networks
n_head (int) – hidden dimension of heads for Attention Networks
dropout_rate (float)

__init__(aux_past_ch, aux_fut_ch, learn_var, output_channel, d_model, d_head, n_head, activation, dropout_rate)[source]¶

-> SUBNET of the DIFFUSION MODEL (DDPM)

It starts with an autoregressive LSTM Network computation of epsilon, then subtracted to ‘y_noised’ tensor. This is always possible! Now we have an approximation of our ‘eps_hat’, that at the end will pass in a residual connection with its embedded version ‘emb_eps_hat’.

‘emb_eps_hat’ will be update with respect to available info about categorical values of our serie: Through an ATTENTION Network we compare past categorical with future categorical to update the embedded noise predicted.

Also, if we have values about auxiliary numerical variables both in past and future, the changes of these variables will be fetched by another ATTENTION Network.

The goal is ensure valuable computations for ‘eps’ always, and then updating things if we have enough data. Both attentions uses { Q = *_future, K = *_past, V = y_past } using as much as possible context variables for better updates.

Parameters:

learn_var (bool) – set if the network has to learn the optim variance of each step
output_channel (int) – number of variables to be predicted
future_steps (int) – number of step in the future, so the number of timesstep to be predicted
d_model (int) – hidden dimension of the model
num_layers_RNN (int) – number of layers for autoregressive prediction
d_head (int) – number of heads for Attention Networks
n_head (int) – hidden dimension of heads for Attention Networks
dropout_rate (float)

forward(y_noised, y_past, cat_past, cat_fut, num_past=None, num_fut=None)[source]¶

‘DIFFUSION SUBNET :param y_noised: [B, future_step, num_var] :type y_noised: torch.Tensor :param y_past: [B, past_step, num_var] :type y_past: torch.Tensor :param cat_past: [B, past_step, d_model]. Defaults to None. :type cat_past: torch.Tensor, optional :param cat_fut: [B, future_step, d_model]. Defaults to None. :type cat_fut: torch.Tensor, optional :param num_past: [B, past_step, d_model]. Defaults to None. :type num_past: torch.Tensor, optional :param num_fut: [B, future_step, d_model]. Defaults to None. :type num_fut: torch.Tensor, optional

Returns:: predicted noise [B, future_step, num_var]. According to ‘learn_var’ param in initialization, the subnet returns another tensor of same size about the variance
Return type:: torch.Tensor

class dsipts.models.Diffusion.SubNet2(aux_past_ch, aux_fut_ch, learn_var, past_steps, future_steps, output_channel, d_model, activation, dropout_rate)[source]¶

Bases: Module

Initialize internal Module state, shared by both nn.Module and ScriptModule.

__init__(aux_past_ch, aux_fut_ch, learn_var, past_steps, future_steps, output_channel, d_model, activation, dropout_rate)[source]¶

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(y_noised, y_past, cat_past, cat_fut, num_past=None, num_fut=None)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class dsipts.models.Diffusion.SubNet3(learn_var, flag_aux_num, num_var, d_model, pred_step, num_layers, d_head, n_head, dropout)[source]¶

Bases: Module

Initialize internal Module state, shared by both nn.Module and ScriptModule.

__init__(learn_var, flag_aux_num, num_var, d_model, pred_step, num_layers, d_head, n_head, dropout)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(y_noised, y_past, cat_past, cat_fut, num_past=None, num_fut=None)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.