I am building a seasonal ARIMA model using the SARIMAX
package from statsmodels
. The following is an illustration of the model:
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
date_range = pd.date_range(start='2000-01-01', end='2009-12-01', freq='MS')
values = np.random.rand(len(date_range))
ts_full = pd.Series(values, index=date_range)
train = ts_full[:-12]
mdl = SARIMAX(train, order=(1, 0, 0), seasonal_order=(1, 0, 0, 12)).fit()
Now I want to see how the model performs on the 12 months of data, which it was not trained on, without refitting the model every month. I tried the following:
mdl.predict(ts_full)
Which results in the following error:
TypeError: Cannot convert input [2000-01-01 0.509615
2000-02-01 0.094391
2000-03-01 0.454202
2000-04-01 0.489502
.
.
.
2009-10-01 0.167847
2009-11-01 0.625154
2009-12-01 0.621803
Freq: MS, dtype: float64] of type <class 'pandas.core.series.Series'> to Timestamp
I found several prediction examples online, however they all either predict only 1 period ahead or require that the model be refit every time period before making the forecast. Is there any way to make the prediction using data on which the model was not trained?
I had the same issue, and I solved it by using SARIMAXResults.apply (https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.apply.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.apply). It applies the fit results to new data. To make a new prediction pased on new data, I use the original fit results
results
like in the following.