Multivariate Time Series Forecasting using Pytorch TimeSeriesDataSet

485 Views Asked by Gwénolé At 19 August 2025 at 05:08

I want to forecast a Target using its history and the history of covariates (Cov1, Cov2,Cov3).

I have several samples (Id) each of them with 601 observations (time) of (Target, Cov1, Cov2,Cov3) and want to train my model (a TemporalFusionTransformer model) on the 1st 60 observations to predict the 541 remaining Target values.

I plan to train/validate my model using Pytorch TimeSeriesDataSet object and then test it on unseen samples.

I readed a lot of Pytorch TimeSeriesDataSet examples (pytorch-forecasting.readthedocs.io , Kaggle notebooks, data scientist posts like https://towardsdatascience.com/all-about-n-hits-the-latest-breakthrough-in-time-series-forecasting-a8ddcb27b0d5...) but most of them subset a single timeseries example in consecutive train/validation/test sets.

I don't find that many examples training on several samples and testing on others. So my questions are about data preprocessing prior to fit my model. Here is the code I used:

max_prediction_length = 540
max_encoder_length = 61 
training_cutoff = df["time"].max() - max_prediction_length

training = TimeSeriesDataSet(df[lambda x: x.time <= training_cutoff],
time_idx="time",target="Target", group_ids=["id"], 
min_encoder_length= max_encoder_length,max_encoder_length=max_encoder_length,
min_prediction_length=max_prediction_length, max_prediction_length=max_prediction_length, 
time_varying_unknown_reals=["Cov1", "Cov2",”Cov3”,”Target”])

# creating validation set (predict=True) which means to predict the last max_prediction_length points in time for each series:
validation = TimeSeriesDataSet.from_dataset(training,df_patients, predict=True, stop_randomization=True)

# create dataloaders for model:
batch_size = 4
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)

I’m not sure if I should use df[lambda x: x.time <= training_cutoff] like I see on code examples I found (instead of df) if I use min_encoder_length= max_encoder_length, max_encoder_length=max_encoder_length, min_prediction_length=max_prediction_length, max_prediction_length=max_prediction_length as length parameters ?
I still don't clearly understand how the boolean predict=True, stop_randomization=True & train=True/False work to differentiate training and validation sets

Any help would be appreciated!

Original Q&A

Multivariate Time Series Forecasting using Pytorch TimeSeriesDataSet

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in PYTORCH

Related Questions in FORECASTING

Related Questions in MULTIVARIATE-TIME-SERIES

Trending Questions

Popular # Hahtags

Popular Questions