I am trying to forecast a multiple time series with the Fable function in R. It seems the most eficient way to do it, but I am very new using R so I'm currently dealing with a lot of problems. I just wanted to ask someone for advices and Ideas. I already found how to do it just using the forecast function package, but in a way that requires a lot of extra steps. My data is an excel with 5701 columns and 50 rows. Each Column as the name of a product in the first row and the next 49 values are numbers, representing the sales from January 2017 to January 2021. First, how to transform that table into a tsibble? I know I need to do that in order to work with Fable, but I'm stuck in such a simple step. Then I would like to have as output a table with the monthy forecast for the next 3 semesters (april 2021 to september 2022) with Product|Date|Model Arima(values)|error of arima(value/values)|model ETS|Error of ETS|model Naive|error of naive..etc. My main objective is to get a table with product|best prediccion for april2021/september2021|best prediccion for october2021/march2021|best prediccion for april2022/september2022|
What I was doing was using this code:
newdata <- read_excel("ALLINCOLUMNS.xlsx")
Fcast <- ts(newdata[,1:5701], start= c(1), end=c(49), frequency=12)
output <- lapply(Fcast, function(x) forecast(auto.arima(x)))
prediction <- as.data.frame(output)
write.table(prediction, file= "C:\\Users\\thega\\OneDrive\\Documentos\\finalprediction.csv",sep=",")
Which gave to me, by default, something in the format |product1.Point.Forecast||Product1.Lo.80||Product1.Hi.80|Product1.Lo.95|Product1.Hi.95|Product2.Point.Forecast|...|Product5071.Hi.95|... anyway, I dont need the 80 and 95 intervals, and that made more difficult to me to work in excel with it. How to get something in the format: |point forecast product 1|point forecast product 2|....|point forecast product 5701|, showing only the forecast? I know that I have to use level=NULL in the forecast function, but it is not working in the ways I had tried. I was planning to do a programming to delete those columns but it is less elegant. Finally, is there any way to show all the errors for the methods in a column? I want to add to my table the best method so I need to verify which as the less error.
The {fable} package works best when data is in a tidy format. In your case, the products should be represented across rows instead of columns. You can read more about what tidy data is here: https://r4ds.had.co.nz/tidy-data.html Once you've done that, you can also read about tidy data for time series here: https://otexts.com/fpp3/tsibbles.html
Without having your dataset, I can only guess that your
Fcast
object (thets()
data) looks something like this:That is, each of your products have their own column (and you have 5701 products, not only 2 I'll use in an example).
If you already have the data in a
ts
object, you can useas_tsibble(<ts>)
to convert it to a tidy time series dataset.Created on 2021-02-25 by the reprex package (v0.3.0)
Setting
pivot_longer = TRUE
will collect the columns into a long format. This format is suitable for the{fable}
package. We now have akey
column which stores the series name (product ID for your data), and the values are stored in thevalue
column.With the data in an appropriate format, we can now use auto
ARIMA()
andforecast()
to obtain forecasts:Created on 2021-02-25 by the reprex package (v0.3.0)
You can also compute forecasts from other models by specifying several models in
model()
.Created on 2021-02-25 by the reprex package (v0.3.0)
The
.model
column now identifies the model used to produce each forecast, of which there are 3 models.If you want to focus on point forecasts side by side, you can
tidyr::pivot_wider()
the forecast.mean
values across several columns.Created on 2021-02-25 by the reprex package (v0.3.0)
You can learn how to evaluate accuracy of these models/forecasts here: https://otexts.com/fpp3/accuracy.html