I am using the fable package to obtain forecasts on a set of hierarchical time series. I would like to specify a hierarchy, that does not have the same depth at all nodes.
Realistic example:
- Time series B1 and B2 are summed to time series M1.
- Time series M1 and M2 are summed to time series T, which is at the top of the hierachy.
- Time series M2 is not the sum of a set of time series; it is it's own time series.
Creating a small random data set in tsibble
format:
library(dplyr)
library(tsibble)
library(fable)
set.seed(1)
B1 <- rnorm(12, mean = 5) + (1:12)
B2 <- rnorm(12, mean = 5)
M2 <- rnorm(12, mean = 25)
ts_data <- tibble(value = c(B1, B2, M2),
month = rep(yearmonth(paste("2020", 1:12, sep="-")), 3),
B = c(rep("B1", 12), rep("B2", 12), rep("B3", 12)),
M = c(rep("M1", 24), rep("M2", 12))) %>%
as_tsibble(key = c("B", "M"), index = month)
Estimating separate ARIMA models on each of the 3 time series, aggregating and forecasting:
fcsts <- ts_data %>%
# Specify hierarchy
aggregate_key(M / B, value = sum(value)) %>%
# Fit models
model(arima = ARIMA(value)) %>%
# Set up reconciliation
mutate(mint = min_trace(arima)) %>%
# Produce the forecasts
forecast(h = 1)
The reason, why I am worried that the result might be wrong, is that I can create a pathological example, where the reconciliation gives smaller confidence intervals, even though there is no actual aggregation:
Pathological example:
- Time series B3 is the child of M2, which is the child of T.
I create a dataset for this example by subsetting the previous one:
ts_data_2 <- ts_data %>%
filter(B == "B3")
Again estimating separate ARIMA models, aggregating and forecasting:
fcsts_2 <- ts_data_2 %>%
# Specify hierarchy
aggregate_key(M / B, value = sum(value)) %>%
# Fit models
model(arima = ARIMA(value)) %>%
# Set up reconciliation
mutate(mint = min_trace(arima)) %>%
# Produce the forecasts
forecast(h = 6)
The result is as follows:
> fcsts_2
# A fable: 6 x 6 [1M]
# Key: M, B, .model [6]
M B .model month value .distribution
<chr> <chr> <chr> <mth> <dbl> <dist>
1 M2 B3 arima 2021 Jan 24.9 N(25, 0.63)
2 M2 <aggregated> arima 2021 Jan 24.9 N(25, 0.63)
3 <aggregated> <aggregated> arima 2021 Jan 24.9 N(25, 0.63)
4 M2 B3 mint 2021 Jan 24.9 N(25, 0.21)
5 M2 <aggregated> mint 2021 Jan 24.9 N(25, 0.21)
6 <aggregated> <aggregated> mint 2021 Jan 24.9 N(25, 0.21)
The variance decreases from 0.63 in the original ARIMA model to 0.21, even though there is no actual aggregation. Of course, this is an example, where reconciliation shouldn't be used at all, but the fact that the variance decreases here makes me worry that the reconciliation doesn't work correctly in the realistic example.
Is there a way to specify the model in the realistic example to avoid aggregation from B3 up to M2? (I have tried using NA instead of the level "B3" in column B, but this doesn't work.)
The issue here is that the reconciliation is based on an estimate of the covariance matrix of the errors, and the default setting uses a diagonal matrix which results in a severe underestimate when there is a degenerate hierarchy.
A better result (although still with some bias) occurs when you use the
mint_shrink
option:Created on 2020-09-23 by the reprex package (v0.3.0)