Unable to implement Holt-Winters Method using statsmodels library

7.4k Views Asked by At

I have a one month data that is on a daily basis.It captures cpu utilization data everyday.I want to produce some forecast results.I have split the data into two parts train- which takes first 15 days and test which takes the last 16 days and on this I want to do a forecast and compare the forecast result with the given last 16 days result.So far I have tried various implementations such as moving average, simple exponential smoothing.Now I want to try something more complex and accurate such as Holt-Winters Method and ARIMA model.Below is the result that I get for Holt's Linear Trend method which takes into account trend and seasonality.

enter image description here

Now I want to implement Holts Winter method which is one of the preferred forecasting technique.Here is the code below

# get the first 15 days
df_train = psql.read_sql("SELECT date,cpu FROM {} where date between '{}' and '{} 23:59:59';".format(conf_list[1], '2018-03-02', '2018-03-16'), conn).fillna(0)
df_train["date"] = pd.to_datetime(df_train["date"], format="%m-%d-%Y")
df_train.set_index("date", inplace=True)
df_train = df_train.resample('D').mean().fillna(0)

# get the last 15 days
df_test = psql.read_sql("SELECT date,cpu FROM {} where date between '{}' and '{} 23:59:59';".format(conf_list[1], '2018-03-18', '2018-03-31'), conn).fillna(0)
df_test["date"] = pd.to_datetime(df_test["date"], format="%m-%d-%Y")
df_test.set_index("date", inplace=True)
df_test = df_test.resample('D').mean().fillna(0)

Here is the code for Holt's Winter method

y_hat_avg = df_test.copy()
fit1 = ExponentialSmoothing(np.asarray(df_train['cpu']), seasonal_periods=1, trend='add', seasonal='add',).fit()
y_hat_avg['Holt_Winter'] = fit1.forecast(len(df_test))
plt.figure(figsize=(16,8))
plt.plot(df_train['cpu'], label='Train')
plt.plot(df_test['cpu'], label='Test')
plt.plot(y_hat_avg['Holt_Winter'], label='Holt_Winter')
plt.legend(loc='best')
plt.show()

Now I am getting an error for the seasonal_periods parameter.It accepts an integer and I believe it accepts month as a value.Even in their documentation, they only refer to as no of seasons http://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html#statsmodels.tsa.holtwinters.ExponentialSmoothing

Now since I have only 1 month of data out which I want to run forecast on first 15 days, what season value should I pass?Assuming seasons refer to months, ideally it should be 0.5 (15 days), but it only accepts integers.If I pass the value as 1, I get the below error

Traceback (most recent call last):
  File "/home/souvik/PycharmProjects/Pandas/forecast_health.py", line 89, in <module>
    fit1 = ExponentialSmoothing(np.asarray(df_train['cpu']), seasonal_periods=1, trend='add', seasonal='add',).fit()
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/tsa/holtwinters.py", line 571, in fit
    Ns=20, full_output=True, finish=None)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 2831, in brute
    Jout = vecfunc(*grid)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/numpy/lib/function_base.py", line 2755, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/numpy/lib/function_base.py", line 2831, in _vectorize_call
    outputs = ufunc(*inputs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 2825, in _scalarfunc
    return func(params, *args)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/tsa/holtwinters.py", line 207, in _holt_win_add_add_dam
    return sqeuclidean((l + phi * b) + s[:-(m - 1)], y)
ValueError: operands could not be broadcast together with shapes (16,) (0,)

If I pass the paramter as None, I get the below error

Traceback (most recent call last):
  File "/home/souvik/PycharmProjects/Pandas/forecast_health.py", line 89, in <module>
    fit1 = ExponentialSmoothing(np.asarray(df_train['cpu']), seasonal_periods=None, trend='add', seasonal='add',).fit()
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/tsa/holtwinters.py", line 399, in __init__
    'Unable to detect season automatically')
NotImplementedError: Unable to detect season automatically

How do I get the forecast for the last 16 days of a month with Holt-Winters method?What am I doing wrong?

Here is the data for the month if anyone wants to reproduce the results

                                cpu
date                               
2018-03-01 00:00:00+00:00  1.060606
2018-03-02 00:00:00+00:00  1.014035
2018-03-03 00:00:00+00:00  1.048611
2018-03-04 00:00:00+00:00  1.493392
2018-03-05 00:00:00+00:00  3.588957
2018-03-06 00:00:00+00:00  2.500000
2018-03-07 00:00:00+00:00  5.265306
2018-03-08 00:00:00+00:00  0.000000
2018-03-09 00:00:00+00:00  3.062099
2018-03-10 00:00:00+00:00  5.861751
2018-03-11 00:00:00+00:00  0.000000
2018-03-12 00:00:00+00:00  0.000000
2018-03-13 00:00:00+00:00  7.235294
2018-03-14 00:00:00+00:00  4.011662
2018-03-15 00:00:00+00:00  3.777409
2018-03-16 00:00:00+00:00  5.754559
2018-03-17 00:00:00+00:00  4.273390
2018-03-18 00:00:00+00:00  2.328782
2018-03-19 00:00:00+00:00  3.106048
2018-03-20 00:00:00+00:00  5.584877
2018-03-21 00:00:00+00:00  9.869841
2018-03-22 00:00:00+00:00  5.588215
2018-03-23 00:00:00+00:00  3.620377
2018-03-24 00:00:00+00:00  3.468021
2018-03-25 00:00:00+00:00  2.605649
2018-03-26 00:00:00+00:00  3.670559
2018-03-27 00:00:00+00:00  4.071777
2018-03-28 00:00:00+00:00  4.159690
2018-03-29 00:00:00+00:00  4.364939
2018-03-30 00:00:00+00:00  4.743253
2018-03-31 00:00:00+00:00  4.928571
1

There are 1 best solutions below

2
On

First of all, error NotImplementedError: Unable to detect season automatically is showing becouse you have defined seasonal_periods as None and yet you still have parameter seasonal as add, you should change that for None.

If your data has monthly seasonality and you have only one month, then you probably don't have seasonality in your sample at all.But if you want, you can check it by ploting Fourier transform of your data in search of seasonality.

Also, I belive that for prediction (in sample as I see from your example) if you are using Statsmodels then it is better to use predict insead of forecast, they yield different results in many cases.