How to work with normalization in time series forecasting?

38 Views Asked by At

I have a dataset that has the following structure:

dates <- seq(Sys.Date(), by = "1 day", length.out = 150)
variables <- paste("Variable", 1:10, sep = "_"
data_combinations <- expand.grid(Date = dates, Variable = variables
df <- data_combinations %>% mutate(Count = sample(1:100, now(data_combinations), replace = TRUE))

I then take my data and split it into train and test set by doing the following:

date_vec <- seq.Date(from = min(df$Date, to = max(df$Date), by = "1 day")
data_80_per <- round(0.8*length(date_vec))
train_data <- df %>% filter(Date %in% date_vec[1:data_80_per])
test_data <- anti_join(df, train_data)

I then take the train data and perform normalization as I am assuming that each variable makes up a proportion of a whole so I make that every row sums to 1 (with columns being the variables)

train_data_normalized <- train_data %>%
     group_by(Date) %>%
     mutate(proportion = Count/sum(Count)) %>%
     dplyr::select(-Date, -Count) %>%
     pivot_wider(names_from = Variable, values_from = proportion)
train_data_normalized <- train_data_normalized[, -1]

Now my data frame is in a format where I can fit VAR model to it and make forecasts for the next 30 days. All the variables will be looked at in 1 model

fit <- vars::VAR(train_data_normalized, p = 1)

forecasts <- predict(fit, n.ahead = 30)

Now the problem I am encountering is that the forecasts are based on training data that was normalized but the testing data has counts in a different scale (the original scale). I want to calculate accuracy metrics such as MSE or MAE to compare the 30 day forecasts with the actuals I have in the test data set. How can I do this?

0

There are 0 best solutions below