How does statsmodels compute fitted values of ARIMA models?

87 Views Asked by At

I am confused about how statsmodels ARIMA computes fitted values. Consider a simple AR(1) process fitted to a randomly generated series

series = array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
       -0.97727788,  0.95008842, -0.15135721, -0.10321885,  0.4105985 ])

we can fit the model:

model = sm.tsa.ARIMA(series, order = (1,0,0)).fit()

get estimates of the parameters:

parameters = model.params

and also get fitted values:

fitted_values = model.fittedvalues

How are these fitted_values calculated (from the initial data and the parameters estimated by the model.)?

-- I tried model.params[0] + model.params[1]*series[i-1], but that didn't work.

-- I am also not sure why model.fittedvalues[0] is the same as model.params[0].

1

There are 1 best solutions below

0
On

I also find the autoregressive models on statsmodels confusing.

The question here is, what are these three parameters:

>>> model.params
array([0.73930757, 0.0181879 , 0.93490084])

One way to find out is to print the summary of the model-fitting result:

>>> model.summary()
                                SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                   10
Model:                 ARIMA(1, 0, 0)   Log Likelihood                 -13.853
Date:                Sun, 25 Feb 2024   AIC                             33.706
Time:                        19:27:30   BIC                             34.614
Sample:                             0   HQIC                            32.710
                                 - 10                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.7393      0.401      1.842      0.065      -0.047       1.526
ar.L1          0.0182      0.461      0.039      0.969      -0.886       0.922
sigma2         0.9349      0.624      1.499      0.134      -0.288       2.158
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 0.40
Prob(Q):                              1.00   Prob(JB):                         0.82
Heteroskedasticity (H):               1.28   Skew:                            -0.08
Prob(H) (two-sided):                  0.85   Kurtosis:                         2.04
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

In the above, you can see that the three parameters (in order) are 'const', 'ar.L1', and 'sigma2'

Another way, is to give your observation data as a Pandas series:

import pandas as pd
series = pd.Series([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
       -0.97727788,  0.95008842, -0.15135721, -0.10321885,  0.4105985 ], name='y')
model = sm.tsa.arima.model.ARIMA(series, order = (1,0,0)).fit()

Then, the parameters are provided as a series with a helpful index:

>>> model.params
const     0.739308
ar.L1     0.018188
sigma2    0.934901
dtype: float64

To figure out what these actually are you have to go into the documentation where you can find this model definition:

enter image description here

While there is no explicit reference to the names 'const', 'ar.L1', and 'sigma2', that I can find, my best guess is:

  • 'const' is the 'delta_0 variable shown in the model, i.e. the constant trend
  • 'ar.L1' is the first auto-regressive coefficient in the lag polynomial, Phi(L)
  • 'sigma2' is short-hand for 'sigma-squared', the standard deviation of the noise.

As to how these are calculated I assume it is using Ordinary Least Squares (OLS). For more details refer to the documentation on the fit method