Why do SARIMA coefficients differ in R, Python, and STATA?

82 Views Asked by At

I have been trying to understand how STATA, R, and Python handle SARIMA forecasting differently, and why the same models have produced different predictions. I started trying to match coefficients, and have been unable to get matching results, despite trying many different command parameters. I have tried relatively simple hyperparameters, with and without a constant.

The code I have used for each and the results are listed below. (Note: the data is not stationary.) I have tried with and without constants. STATA finds the MA and season AR terms as not statistically significant, while R and Python finds they are significant. The R and Python results are similar excepting the intercept.

I want to understand what is going on with each process, so I can decide which software to use.

R Code and Results

### R code
ts = c(97.3792037, 95.95151675, 91.31119854, 104.8981482, 105.6913949, 109.9757122, 112.0637876, 107.5345058, 105.7025947, 102.9343384, 99.05647977, 96.41892813, 98.65541846, 96.03831397, 96.16089129, 104.7498141, 104.1124595, 118.547745, 111.2792591, 107.334795, 108.5811032, 104.6114119, 102.2481883, 97.9402986, 97.50978289, 97.18689356, 96.14751328, 104.4959885, 111.2902446, 116.7373247, 111.0557771, 108.2667206, 110.4096783, 104.3825081, 105.0318585, 100.003698, 99.76290231, 100.2887926, 95.43952968, 109.5714367, 112.2008799, 117.2256497, 113.5328799, 113.6856444, 112.138176, 107.5428538, 106.0787703, 100.4712412, 102.7647221, 99.52197604, 93.54484612, 110.8328027, 111.6206782, 115.5910999, 112.3579283, 114.219969, 109.6037091, 107.2897836, 106.0347934, 96.26569125, 99.31431129, 97.28041353, 94.8177262, 108.4643059, 108.629378, 111.9023327, 114.5230675, 113.9775546, 110.0124363, 111.4763949, 110.63024, 105.8363866, 103.6651122, 99.88333821, 99.58756215, 109.5469192, 111.4346437, 113.7493204, 110.9739191, 110.2230651, 110.5110836, 110.3259096, 106.0284998, 106.0575086, 106.3202784, 102.9241831, 103.9779438, 111.4025093, 116.5582041, 124.8018515, 123.7163744, 116.111585, 117.509964, 114.6407459, 112.7685792, 108.6699399, 106.2470851, 107.4548057, 107.4706578, 118.5840337, 119.1763486, 127.357542, 123.0865202, 124.1670008, 122.9045664, 119.0590287, 120.3916487, 114.4012977, 114.4010663, 110.6166835, 115.4359241, 124.4532449, 126.4941134, 134.3627807, 128.3255046, 128.9491699, 126.3461706, 119.4532097, 124.8579134, 120.5122081)
library(forecast)
model = Arima(ts, order = c(1,0,1), seasonal = list(order = c(1,0,0), period = 12), method = "ML")
ar1 ma1 sar1 mean
Coefficients 0.9127 -0.4565 0.8696 111.8831
se 0.0438 0.1094 0.0414 7.6850

Python Code and Results

### Python code
import numpy as np
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
ts = [97.3792037, 95.95151675, 91.31119854, 104.8981482, 105.6913949, 109.9757122, 112.0637876, 107.5345058, 105.7025947, 102.9343384, 99.05647977, 96.41892813, 98.65541846, 96.03831397, 96.16089129, 104.7498141, 104.1124595, 118.547745, 111.2792591, 107.334795, 108.5811032, 104.6114119, 102.2481883, 97.9402986, 97.50978289, 97.18689356, 96.14751328, 104.4959885, 111.2902446, 116.7373247, 111.0557771, 108.2667206, 110.4096783, 104.3825081, 105.0318585, 100.003698, 99.76290231, 100.2887926, 95.43952968, 109.5714367, 112.2008799, 117.2256497, 113.5328799, 113.6856444, 112.138176, 107.5428538, 106.0787703, 100.4712412, 102.7647221, 99.52197604, 93.54484612, 110.8328027, 111.6206782, 115.5910999, 112.3579283, 114.219969, 109.6037091, 107.2897836, 106.0347934, 96.26569125, 99.31431129, 97.28041353, 94.8177262,  108.4643059, 108.629378, 111.9023327, 114.5230675, 113.9775546, 110.0124363, 111.4763949, 110.63024, 105.8363866, 103.6651122, 99.88333821, 99.58756215, 109.5469192, 111.4346437, 113.7493204, 110.9739191, 110.2230651, 110.5110836, 110.3259096, 106.0284998, 106.0575086, 106.3202784, 102.9241831, 103.9779438, 111.4025093, 116.5582041, 124.8018515, 123.7163744, 116.111585, 117.509964, 114.6407459, 112.7685792, 108.6699399, 106.2470851, 107.4548057, 107.4706578, 118.5840337, 119.1763486, 127.357542, 123.0865202, 124.1670008, 122.9045664, 119.0590287, 120.3916487, 114.4012977, 114.4010663, 110.6166835, 115.4359241, 124.4532449, 126.4941134, 134.3627807, 128.3255046, 128.9491699, 126.3461706, 119.4532097, 124.8579134, 120.5122081]
ts_df = pd.DataFrame({'ts': ts})
model = SARIMAX(ts_df\['ts'\], order=(1, 0, 1), seasonal_order=(1, 0, 0, 12), trend='c', enforce_stationarity=False)
result = model.fit(disp=0, maxiter=100, method="bfgs", cov_type = "oim")
print(result.summary())
coef std err z P>|z| [0.025 0.975]
intercept 1.4956 1.032 1.449 0.147 -0.528 3.519
ar.L1 0.8969 0.062 14.532 0.000 0.776 1.018
ma.L1 -0.5144 0.119 -4.324 0.000 -0.748 -0.281
ar.S.L12 0.8910 0.049 18.367 0.000 0.796 0.986
sigma2 7.6082 1.040 7.314 0.000 5.569 9.647

STATA Code and Results

* STATA code
clear all
input int time float ts
1   97.3792037
2   95.95151675
3   91.31119854
4   104.8981482
5   105.6913949
6   109.9757122
7   112.0637876
8   107.5345058
9   105.7025947
10  102.9343384
11  99.05647977
12  96.41892813
13  98.65541846
14  96.03831397
15  96.16089129
16  104.7498141
17  104.1124595
18  118.547745
19  111.2792591
20  107.334795
21  108.5811032
22  104.6114119
23  102.2481883
24  97.9402986
25  97.50978289
26  97.18689356
27  96.14751328
28  104.4959885
29  111.2902446
30  116.7373247
31  111.0557771
32  108.2667206
33  110.4096783
34  104.3825081
35  105.0318585
36  100.003698
37  99.76290231
38  100.2887926
39  95.43952968
40  109.5714367
41  112.2008799
42  117.2256497
43  113.5328799
44  113.6856444
45  112.138176
46  107.5428538
47  106.0787703
48  100.4712412
49  102.7647221
50  99.52197604
51  93.54484612
52  110.8328027
53  111.6206782
54  115.5910999
55  112.3579283
56  114.219969
57  109.6037091
58  107.2897836
59  106.0347934
60  96.26569125
61  99.31431129
62  97.28041353
63  94.8177262
64  108.4643059
65  108.629378
66  111.9023327
67  114.5230675
68  113.9775546
69  110.0124363
70  111.4763949
71  110.63024
72  105.8363866
73  103.6651122
74  99.88333821
75  99.58756215
76  109.5469192
77  111.4346437
78  113.7493204
79  110.9739191
80  110.2230651
81  110.5110836
82  110.3259096
83  106.0284998
84  106.0575086
85  106.3202784
86  102.9241831
87  103.9779438
88  111.4025093
89  116.5582041
90  124.8018515
91  123.7163744
92  116.111585
93  117.509964
94  114.6407459
95  112.7685792
96  108.6699399
97  106.2470851
98  107.4548057
99  107.4706578
100 118.5840337
101 119.1763486
102 127.357542
103 123.0865202
104 124.1670008
105 122.9045664
106 119.0590287
107 120.3916487
108 114.4012977
109 114.4010663
110 110.6166835
111 115.4359241
112 124.4532449
113 126.4941134
114 134.3627807
115 128.3255046
116 128.9491699
117 126.3461706
118 119.4532097
119 124.8579134
120 120.5122081
end
tsset time
arima ts, arima(1,0,1) sarima(1,0,0,12) technique(bfgs) diffuse
ts Coefficient std. err. z P>|z| [95% conf. interval]
_cons 110.5248 2.698564 40.96 0.000 105.2357 115.8139
ARMA
ar | L1. .8340737 .0504238 16.54 0.000 .7352449 .9329025
ma | L1. -.0000333 .0003224 -0.10 0.918 -.0006651 .0005986
ARMA12
ar | L1. .0000193 .0001236 0.16 0.876 -.0002229 .0002616
/sigma 4.828537 .3242497 14.89 0.000 4.19302 5.464055
0

There are 0 best solutions below