My question is: if I had a tsibble with more than one key (n_keys > 1), and either one or more key variables (key_vars >= 1), is the tidyverts suite able to perform a box_cox transformation on each time series (one box_cox transformation per time series) using a respective lambda_guerrero value per time series? Below is my (first) attempt at a minimally reproducible example.
For example: I'm wondering if "step 5" is possible using the tidyverts suite without receiving an error. Rather than apply lambda1=0.36 to concessional, general, and aggregated, as seen in "step 4" without error, I'd like to apply 0.25 to concessional, 0.66 to general, and 0.36 to aggregated, if possible.
Thank you!
library(tidyverse)
library(lubridate)
library(tsibble)
library(tsibbledata)
library(fabletools)
library(fable)
library(feasts)
library(distributional)
step 1: one key, without Transformation:
tsibbledata::PBS %>% summarize(Cost = sum(Cost)) %>% autoplot(Cost)
step 2: one key, with Transformation:
Siimilar to an example in FPP3 Chapter 3.1. For reference: https://otexts.com/fpp3/transformations.html
lambda1 <- tsibbledata::PBS %>%
summarize(Cost = sum(Cost)) %>%
features(Cost, features = guerrero) %>%
pull(lambda_guerrero) # [1] 0.3642197
tsibbledata::PBS %>% summarize(Cost = sum(Cost)) %>% autoplot(box_cox(Cost,lambda1))
step 3: three keys, without Transformation:
tsibbledata::PBS %>% aggregate_key(Concession, Cost = sum(Cost)) %>% autoplot(Cost)
step 4: three keys, with one Transformation:
tsibbledata::PBS %>%
aggregate_key(Concession, Cost = sum(Cost)) %>%
autoplot(box_cox(Cost,lambda1))
step 5: three keys, with three Transformation:
lambda2 <- tsibbledata::PBS %>%
aggregate_key(Concession, Cost = sum(Cost)) %>%
features(Cost, features = guerrero) %>%
pull(lambda_guerrero) # [1] 0.2518823 0.6577645 0.3642197
lambda2
A tibble: 3 x 2
Concession lambda_guerrero
<chr*> <dbl>
1 Concessional 0.252
2 General 0.658
3 <aggregated> 0.364
tsibbledata::PBS %>%
aggregate_key(Concession, Cost = sum(Cost)) %>%
autoplot(box_cox(Cost,lambda2)) # caused an error
The issue with your last attempt is related to the length of the values inputted into
box_cox(Cost, lambda2)
.Cost
has length 612 (204 observations for 3 series), andlambda2
has length 3. So R will try to replicate the values in lambda2 so that the lengths match (called "recycling").However, it does this wrong in this case. It matches
Cost[1]
withlambda2[1]
(correct),Cost[2]
withlambda2[2]
(incorrect),Cost[3]
withlambda2[3]
(incorrect),Cost[3]
withlambda2[1]
(correct), etc. The correct recycling of the parameters isCost[1:204]
useslambda2[1]
,Cost[205:408]
withlambda2[2]
, andCost[409:612]
withlambda2[3]
, so we need to ensure this.This can be done with
rep(lambda2, each = 204)
, however the best/safest approach is to use a join operation. This ensures that the parameter matches the correct series (and prevents issues with row ordering). The code below shows how this can be done withleft_join()
, which matches the lambda values to the data based on the Concession column. Note that the plot doesn't look very good as the transformations (and data) produce values on very different scales. To fix this I recommend facetting to produce different y-axis scales for each series (as done below also).Created on 2021-01-09 by the reprex package (v0.3.0)