I have a one month data that is on a daily basis.It captures cpu utilization
data everyday.I want to produce some forecast results.I have split the data into two parts train
- which takes first 15 days and test
which takes the last 16 days and on this I want to do a forecast and compare the forecast result with the given last 16 days result.So far I have tried various implementations such as moving average
, simple exponential smoothing
.Now I want to try something more complex and accurate such as Holt-Winters Method
and ARIMA model
.Below is the result that I get for Holt's Linear Trend
method which takes into account trend and seasonality.
Now I want to implement Holts Winter method
which is one of the preferred forecasting technique.Here is the code below
# get the first 15 days
df_train = psql.read_sql("SELECT date,cpu FROM {} where date between '{}' and '{} 23:59:59';".format(conf_list[1], '2018-03-02', '2018-03-16'), conn).fillna(0)
df_train["date"] = pd.to_datetime(df_train["date"], format="%m-%d-%Y")
df_train.set_index("date", inplace=True)
df_train = df_train.resample('D').mean().fillna(0)
# get the last 15 days
df_test = psql.read_sql("SELECT date,cpu FROM {} where date between '{}' and '{} 23:59:59';".format(conf_list[1], '2018-03-18', '2018-03-31'), conn).fillna(0)
df_test["date"] = pd.to_datetime(df_test["date"], format="%m-%d-%Y")
df_test.set_index("date", inplace=True)
df_test = df_test.resample('D').mean().fillna(0)
Here is the code for Holt's Winter method
y_hat_avg = df_test.copy()
fit1 = ExponentialSmoothing(np.asarray(df_train['cpu']), seasonal_periods=1, trend='add', seasonal='add',).fit()
y_hat_avg['Holt_Winter'] = fit1.forecast(len(df_test))
plt.figure(figsize=(16,8))
plt.plot(df_train['cpu'], label='Train')
plt.plot(df_test['cpu'], label='Test')
plt.plot(y_hat_avg['Holt_Winter'], label='Holt_Winter')
plt.legend(loc='best')
plt.show()
Now I am getting an error for the seasonal_periods
parameter.It accepts an integer and I believe it accepts month as a value.Even in their documentation, they only refer to as no of seasons http://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html#statsmodels.tsa.holtwinters.ExponentialSmoothing
Now since I have only 1 month of data out which I want to run forecast on first 15 days, what season value should I pass?Assuming seasons refer to months, ideally it should be 0.5 (15 days), but it only accepts integers.If I pass the value as 1, I get the below error
Traceback (most recent call last):
File "/home/souvik/PycharmProjects/Pandas/forecast_health.py", line 89, in <module>
fit1 = ExponentialSmoothing(np.asarray(df_train['cpu']), seasonal_periods=1, trend='add', seasonal='add',).fit()
File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/tsa/holtwinters.py", line 571, in fit
Ns=20, full_output=True, finish=None)
File "/home/souvik/data_analysis/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 2831, in brute
Jout = vecfunc(*grid)
File "/home/souvik/data_analysis/lib/python3.5/site-packages/numpy/lib/function_base.py", line 2755, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/souvik/data_analysis/lib/python3.5/site-packages/numpy/lib/function_base.py", line 2831, in _vectorize_call
outputs = ufunc(*inputs)
File "/home/souvik/data_analysis/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 2825, in _scalarfunc
return func(params, *args)
File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/tsa/holtwinters.py", line 207, in _holt_win_add_add_dam
return sqeuclidean((l + phi * b) + s[:-(m - 1)], y)
ValueError: operands could not be broadcast together with shapes (16,) (0,)
If I pass the paramter as None
, I get the below error
Traceback (most recent call last):
File "/home/souvik/PycharmProjects/Pandas/forecast_health.py", line 89, in <module>
fit1 = ExponentialSmoothing(np.asarray(df_train['cpu']), seasonal_periods=None, trend='add', seasonal='add',).fit()
File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/tsa/holtwinters.py", line 399, in __init__
'Unable to detect season automatically')
NotImplementedError: Unable to detect season automatically
How do I get the forecast for the last 16 days of a month with Holt-Winters method?What am I doing wrong?
Here is the data for the month if anyone wants to reproduce the results
cpu
date
2018-03-01 00:00:00+00:00 1.060606
2018-03-02 00:00:00+00:00 1.014035
2018-03-03 00:00:00+00:00 1.048611
2018-03-04 00:00:00+00:00 1.493392
2018-03-05 00:00:00+00:00 3.588957
2018-03-06 00:00:00+00:00 2.500000
2018-03-07 00:00:00+00:00 5.265306
2018-03-08 00:00:00+00:00 0.000000
2018-03-09 00:00:00+00:00 3.062099
2018-03-10 00:00:00+00:00 5.861751
2018-03-11 00:00:00+00:00 0.000000
2018-03-12 00:00:00+00:00 0.000000
2018-03-13 00:00:00+00:00 7.235294
2018-03-14 00:00:00+00:00 4.011662
2018-03-15 00:00:00+00:00 3.777409
2018-03-16 00:00:00+00:00 5.754559
2018-03-17 00:00:00+00:00 4.273390
2018-03-18 00:00:00+00:00 2.328782
2018-03-19 00:00:00+00:00 3.106048
2018-03-20 00:00:00+00:00 5.584877
2018-03-21 00:00:00+00:00 9.869841
2018-03-22 00:00:00+00:00 5.588215
2018-03-23 00:00:00+00:00 3.620377
2018-03-24 00:00:00+00:00 3.468021
2018-03-25 00:00:00+00:00 2.605649
2018-03-26 00:00:00+00:00 3.670559
2018-03-27 00:00:00+00:00 4.071777
2018-03-28 00:00:00+00:00 4.159690
2018-03-29 00:00:00+00:00 4.364939
2018-03-30 00:00:00+00:00 4.743253
2018-03-31 00:00:00+00:00 4.928571
First of all, error
NotImplementedError: Unable to detect season automatically
is showing becouse you have definedseasonal_periods
as None and yet you still have parameterseasonal
asadd
, you should change that for None.If your data has monthly seasonality and you have only one month, then you probably don't have seasonality in your sample at all.But if you want, you can check it by ploting Fourier transform of your data in search of seasonality.
Also, I belive that for prediction (in sample as I see from your example) if you are using Statsmodels then it is better to use
predict
insead offorecast
, they yield different results in many cases.