Update a model with test data to make new forecasts after training it on training data

216 Views Asked by At

I would like to understand if the procedure I'm following is standard or if I'm making some mistake. I have a time series of 48 values ​​(one value per month from 2018 to 2021), stored in the data frame df:

             Amount
2018-01      125.6 
...          ...
2020-12      145.2     
2021-01      148.4
...          ...
2021-12      198.8

I would like to create a model that can predict the quantity for the months I want.

In short, I take the first three years (36 months) and use this data to train my model, and then test it on the last year (2021), as follows:

df_train = df[:36]
df_test = df[36:]

arima = pm.auto_arima(df_train, error_action='ignore', trace=True,
                      suppress_warnings=True, maxiter=50,
                      seasonal=True, m=12,
                      random_state=1)

# Best model: ARIMA(1,1,1)(0,1,0)[12] 

predictions, conf_int = arima.predict(n_periods=12, return_conf_int=True)

df_predictions = pd.DataFrame(predictions, index=df_test.index)
df_predictions.columns = ['Predicted amount']

Then, I use:

r2_score(df_test['Amount'], df_predictions['Predicted amount'])

getting about 0.92, so everything seems to be fine. Is this correct up to here?

Finally, I want to forecast 2022 amounts, where I have no control data. To do this, I update the model and repeat the process from before:

arima.update(df_test)
df_forecasts = pd.DataFrame(arima.predict(n_periods=12), index=pd.date_range(start='2022-01-01', end='2022-12-01', freq='MS'))
df_forecasts.columns = ['Forecasted amount']

I'm more unsure about this last part, is that correct?

I have made a very concise summary of the procedure, but I am interested in understanding if the path I have followed is standard and correct. Thanks to anyone who can answer me.

0

There are 0 best solutions below