I am new to time series forecasting. I encountered Pytorch Forecasting a few months back.
I was wondering if there is at all any possibility to convert a TimeSeriesDataSet object or an object dataloader object to a dataframe.
I saw this post: How to convert torch tensor to pandas dataframe? that you can do so; however, the difficult part is mapping the columns and understand different parts within a tensor object.
Take the temporal fusion transformer tutorial as an exmaple: https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/stallion.html
max_encoder_length = 24
training_cutoff = data["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="volume",
group_ids=["agency", "sku"],
min_encoder_length=max_encoder_length // 2, # keep encoder length long (as it is in the validation set)
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
static_categoricals=["agency", "sku"],
static_reals=["avg_population_2017", "avg_yearly_household_income_2017"],
time_varying_known_categoricals=["special_days", "month"],
variable_groups={"special_days": special_days}, # group of categorical variables can be treated as one variable
time_varying_known_reals=["time_idx", "price_regular", "discount_in_percent"],
time_varying_unknown_categoricals=[],
time_varying_unknown_reals=[
"volume",
"log_volume",
"industry_volume",
"soda_volume",
"avg_max_temp",
"avg_volume_by_agency",
"avg_volume_by_sku",
],
target_normalizer=GroupNormalizer(
groups=["agency", "sku"], transformation="softplus"
), # use softplus and normalize by group
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
)
# create validation set (predict=True) which means to predict the last max_prediction_length points in time
# for each series
validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
# create dataloaders for model
batch_size = 128 # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)
# calculate baseline mean absolute error, i.e. predict next value as the last available value from the history
actuals = torch.cat([y for x, (y, weight) in iter(val_dataloader)])
baseline_predictions = Baseline().predict(val_dataloader)
(actuals - baseline_predictions).abs().mean().item()
How would one convert validation
or val_dataloader
to pd.DataFrame object?
Moreover, is it possible to also convert the prediction result (torch.tensor) into a dataframe as well?
validation:
>>> validation.data
>>> {'reals': tensor([[-0.9593, -0.6123, 0.0000, ..., -2.9171, -1.0676, 1.0738],
[-0.9593, -0.6123, 0.0000, ..., -2.1644, -1.0561, 1.3626],
[-0.9593, -0.6123, 0.0000, ..., -0.9712, -1.0254, 1.6461],
...,
[ 1.2221, 1.2074, 0.0000, ..., -0.2065, 1.4175, -1.4105],
[ 1.2221, 1.2074, 0.0000, ..., -0.1767, 0.9821, -1.4105],
[ 1.2221, 1.2074, 0.0000, ..., -1.3639, 1.3839, -1.4105]]),
'categoricals': tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 4],
[ 0, 0, 3, ..., 0, 7, 5],
...,
[57, 17, 0, ..., 0, 0, 1],
[57, 17, 0, ..., 1, 0, 2],
[57, 17, 0, ..., 0, 0, 3]]),
'groups': tensor([[ 0, 0],
[ 0, 0],
[ 0, 0],
...,
[57, 17],
[57, 17],
[57, 17]]),
'target': [tensor([8.0676e+01, 9.8064e+01, 1.3370e+02, ..., 9.9000e-01, 9.0000e-02,
2.2500e+00])],
'weight': None,
'time': tensor([ 0, 1, 2, ..., 57, 58, 59])}
Attempt:
>>> actuals = torch.cat([y for x, (y, weight) in iter(val_dataloader)])
baseline_predictions = Baseline().predict(val_dataloader)
prediction_df = pd.DataFrame(baseline_predictions.numpy())
prediction_df.columns = validation.data.keys()
prediction_df
>>> reals categoricals groups target weight time
0 84.239998 84.239998 84.239998 84.239998 84.239998 84.239998
1 43.848000 43.848000 43.848000 43.848000 43.848000 43.848000
2 25.718399 25.718399 25.718399 25.718399 25.718399 25.718399
3 15.208200 15.208200 15.208200 15.208200 15.208200 15.208200
4 25.240499 25.240499 25.240499 25.240499 25.240499 25.240499
... ... ... ... ... ... ...
345 349.228790 349.228790 349.228790 349.228790 349.228790 349.228790
346 2053.746094 2053.746094 2053.746094 2053.746094 2053.746094 2053.746094
347 2207.361816 2207.361816 2207.361816 2207.361816 2207.361816 2207.361816
348 77.437500 77.437500 77.437500 77.437500 77.437500 77.437500
349 2.520000 2.520000 2.520000 2.520000 2.520000 2.520000
350 rows × 6 columns
Ideal Result would look something like:
The result is not making a lot of sense. Am I missing something? How would I map it back to find the correct column names and indices?