ValueError: math domain error in AutoARIMA model of StatsForecast

91 Views Asked by At

I am trying to use statsforecast AutoArima for forecasting on below type of data:

zip_code product_family week_date rma_count
12198 ABC 2021-01-03 6.0
61022 DEF 2021-01-03 1.0
43106 GHI 2021-01-03 4.0
18019 XYZ 2021-01-03 3.0

I have two years data for training and forecasting for a 13 week horizon.

I am calling model.fit like below:

models = [
        AutoARIMA(season_length=52),
        HoltWinters(season_length=52, error_type='A'),
        DynamicOptimizedTheta(season_length=52, 
                decomposition_type="additive"),
        SeasonalNaive(season_length=52)
    ]
    
    model = StatsForecast(models=models, 
                              freq='W', n_jobs=-1, fallback_model=Naive())
    model.fit(train_agg)

But for AutoARIMA I am getting "ValueError: math domain error", stack trace below:

Traceback (most recent call last):
  File "/miniconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/miniconda3/lib/python3.8/site-packages/statsforecast/core.py", line 77, in fit
    fm[i, i_model] = new_model.fit(y=y, X=X)
  File "/miniconda3/lib/python3.8/site-packages/statsforecast/models.py", line 328, in fit
    self.model_ = auto_arima_f(
  File "/miniconda3/lib/python3.8/site-packages/statsforecast/arima.py", line 1828, in auto_arima_f
    fit = Arima(x, order=(0, 0, 0), include_mean=False)
  File "/miniconda3/lib/python3.8/site-packages/statsforecast/arima.py", line 1487, in Arima
    tmp["bic"] = tmp["aic"] + npar * (math.log(nstar) - 2)

Looking at the code generating error, seems to be that nstar is coming out negative in below equation:

nstar = n - tmp["arma"][5] - tmp["arma"][6] * tmp["arma"][4]

All these values are a result of forecast evaluation only. Is there any parameter I can pass to fix this error or is it a data issue like having smaller numbers?

1

There are 1 best solutions below

0
On

Found the problem.

I was running hierarchical forecasting on multiple time series.

Some of those time series had missing data points.

For me those type of time series can be ignored, so I removed them. Alternatively, you can fill missing data points with some constant value.

Thanks!