How to apply Fable/Forecast (in R) to this database?

717 Views Asked by At

I am trying to forecast a multiple time series with the Fable function in R. It seems the most eficient way to do it, but I am very new using R so I'm currently dealing with a lot of problems. I just wanted to ask someone for advices and Ideas. I already found how to do it just using the forecast function package, but in a way that requires a lot of extra steps. My data is an excel with 5701 columns and 50 rows. Each Column as the name of a product in the first row and the next 49 values are numbers, representing the sales from January 2017 to January 2021. First, how to transform that table into a tsibble? I know I need to do that in order to work with Fable, but I'm stuck in such a simple step. Then I would like to have as output a table with the monthy forecast for the next 3 semesters (april 2021 to september 2022) with Product|Date|Model Arima(values)|error of arima(value/values)|model ETS|Error of ETS|model Naive|error of naive..etc. My main objective is to get a table with product|best prediccion for april2021/september2021|best prediccion for october2021/march2021|best prediccion for april2022/september2022|

What I was doing was using this code:

newdata <- read_excel("ALLINCOLUMNS.xlsx")
Fcast <- ts(newdata[,1:5701], start= c(1), end=c(49), frequency=12)
output <- lapply(Fcast, function(x) forecast(auto.arima(x)))
prediction <- as.data.frame(output)
write.table(prediction, file= "C:\\Users\\thega\\OneDrive\\Documentos\\finalprediction.csv",sep=",")

Which gave to me, by default, something in the format |product1.Point.Forecast||Product1.Lo.80||Product1.Hi.80|Product1.Lo.95|Product1.Hi.95|Product2.Point.Forecast|...|Product5071.Hi.95|... anyway, I dont need the 80 and 95 intervals, and that made more difficult to me to work in excel with it. How to get something in the format: |point forecast product 1|point forecast product 2|....|point forecast product 5701|, showing only the forecast? I know that I have to use level=NULL in the forecast function, but it is not working in the ways I had tried. I was planning to do a programming to delete those columns but it is less elegant. Finally, is there any way to show all the errors for the methods in a column? I want to add to my table the best method so I need to verify which as the less error.

1

There are 1 best solutions below

2
On

The {fable} package works best when data is in a tidy format. In your case, the products should be represented across rows instead of columns. You can read more about what tidy data is here: https://r4ds.had.co.nz/tidy-data.html Once you've done that, you can also read about tidy data for time series here: https://otexts.com/fpp3/tsibbles.html

Without having your dataset, I can only guess that your Fcast object (the ts() data) looks something like this:

Fcast <- cbind(mdeaths,fdeaths)
Fcast
#>          mdeaths fdeaths
#> Jan 1974    2134     901
#> Feb 1974    1863     689
#> Mar 1974    1877     827
#> Apr 1974    1877     677
#> May 1974    1492     522
#> Jun 1974    1249     406
#> Jul 1974    1280     441
#> and so on ...

That is, each of your products have their own column (and you have 5701 products, not only 2 I'll use in an example).

If you already have the data in a ts object, you can use as_tsibble(<ts>) to convert it to a tidy time series dataset.

library(tsibble)
as_tsibble(Fcast, pivot_longer = TRUE)
#> # A tsibble: 144 x 3 [1M]
#> # Key:       key [2]
#>       index key     value
#>       <mth> <chr>   <dbl>
#>  1 1974 Jan fdeaths   901
#>  2 1974 Feb fdeaths   689
#>  3 1974 Mar fdeaths   827
#>  4 1974 Apr fdeaths   677
#>  5 1974 May fdeaths   522
#>  6 1974 Jan mdeaths  2134
#>  7 1974 Feb mdeaths  1863
#>  8 1974 Mar mdeaths  1877
#>  9 1974 Apr mdeaths  1877
#> 10 1974 May mdeaths  1492

Created on 2021-02-25 by the reprex package (v0.3.0)

Setting pivot_longer = TRUE will collect the columns into a long format. This format is suitable for the {fable} package. We now have a key column which stores the series name (product ID for your data), and the values are stored in the value column.

With the data in an appropriate format, we can now use auto ARIMA() and forecast() to obtain forecasts:

library(fable)
#> Loading required package: fabletools
as_tsibble(Fcast, pivot_longer = TRUE) %>% 
  model(ARIMA(value)) %>% 
  forecast()
#> # A fable: 48 x 5 [1M]
#> # Key:     key, .model [2]
#>    key     .model          index        value .mean
#>    <chr>   <chr>           <mth>       <dist> <dbl>
#>  1 fdeaths ARIMA(value) 1980 Jan N(825, 6184)  825.
#>  2 fdeaths ARIMA(value) 1980 Feb N(820, 6184)  820.
#>  3 fdeaths ARIMA(value) 1980 Mar N(767, 6184)  767.
#>  4 fdeaths ARIMA(value) 1980 Apr N(605, 6184)  605.
#>  5 fdeaths ARIMA(value) 1980 May N(494, 6184)  494.
#>  6 fdeaths ARIMA(value) 1980 Jun N(423, 6184)  423.
#>  7 fdeaths ARIMA(value) 1980 Jul N(414, 6184)  414.
#>  8 fdeaths ARIMA(value) 1980 Aug N(367, 6184)  367.
#>  9 fdeaths ARIMA(value) 1980 Sep N(376, 6184)  376.
#> 10 fdeaths ARIMA(value) 1980 Oct N(442, 6184)  442.
#> # … with 38 more rows

Created on 2021-02-25 by the reprex package (v0.3.0)

You can also compute forecasts from other models by specifying several models in model().

Fcast <- cbind(mdeaths,fdeaths)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(fable)
#> Loading required package: fabletools
as_tsibble(Fcast, pivot_longer = TRUE) %>% 
  model(arima = ARIMA(value), ets = ETS(value), snaive = SNAIVE(value)) %>% 
  forecast()
#> # A fable: 144 x 5 [1M]
#> # Key:     key, .model [6]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths arima  1980 Jan N(825, 6184)  825.
#>  2 fdeaths arima  1980 Feb N(820, 6184)  820.
#>  3 fdeaths arima  1980 Mar N(767, 6184)  767.
#>  4 fdeaths arima  1980 Apr N(605, 6184)  605.
#>  5 fdeaths arima  1980 May N(494, 6184)  494.
#>  6 fdeaths arima  1980 Jun N(423, 6184)  423.
#>  7 fdeaths arima  1980 Jul N(414, 6184)  414.
#>  8 fdeaths arima  1980 Aug N(367, 6184)  367.
#>  9 fdeaths arima  1980 Sep N(376, 6184)  376.
#> 10 fdeaths arima  1980 Oct N(442, 6184)  442.
#> # … with 134 more rows

Created on 2021-02-25 by the reprex package (v0.3.0)

The .model column now identifies the model used to produce each forecast, of which there are 3 models.

If you want to focus on point forecasts side by side, you can tidyr::pivot_wider() the forecast .mean values across several columns.

library(tsibble)
library(fable)
library(tidyr)
Fcast <- cbind(mdeaths,fdeaths)
as_tsibble(Fcast, pivot_longer = TRUE) %>% 
  model(arima = ARIMA(value), ets = ETS(value), snaive = SNAIVE(value)) %>% 
  forecast() %>% 
  as_tibble() %>% 
  pivot_wider(id_cols = c("key", "index"), names_from = ".model", values_from = ".mean")
#> # A tibble: 48 x 5
#>    key        index arima   ets snaive
#>    <chr>      <mth> <dbl> <dbl>  <dbl>
#>  1 fdeaths 1980 Jan  825.  789.    821
#>  2 fdeaths 1980 Feb  820.  812.    785
#>  3 fdeaths 1980 Mar  767.  746.    727
#>  4 fdeaths 1980 Apr  605.  592.    612
#>  5 fdeaths 1980 May  494.  479.    478
#>  6 fdeaths 1980 Jun  423.  413.    429
#>  7 fdeaths 1980 Jul  414.  394.    405
#>  8 fdeaths 1980 Aug  367.  355.    379
#>  9 fdeaths 1980 Sep  376.  365.    393
#> 10 fdeaths 1980 Oct  442.  443.    411
#> # … with 38 more rows

Created on 2021-02-25 by the reprex package (v0.3.0)

You can learn how to evaluate accuracy of these models/forecasts here: https://otexts.com/fpp3/accuracy.html