Python + facebook Prophet error in forecasting

122 Views Asked by At

Tell me what I'm doing wrong. I insert a time series into the prophet input, I get a forecast, but it looks like a repeating pattern. And absolutely nothing like the forecast.

result of forecasting

import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
from sktime.forecasting.fbprophet import Prophet

salesData = [-22.899810546632665, -9.11228684600458, -2.0232803049199948, 2.769880807257286, 0.8771850621655833, 5.240523426843543, -4.101202313494994, -7.438606008637052, 1.8640765345461658, 2.6530910373074446, 0.6816397761862772, -5.369005299189235, 2.2995963745863106, -0.9488515556078191, -2.490658867190924, 1.495849663486175, 1.849161620028351, 4.574290696478775, 1.8606830281445728, 4.5401410593181915, -5.0697333665656465, -6.8037280937002205, -0.41728360333518366, -0.397468128796101, 1.0964987155515669, 0.9241856998562122, 7.4987636157925674, -0.9036386621033988, -4.168575486734736, 1.0455838313276498, 3.5501944037387263, 3.7117838928135645, -2.249350543191892, -0.9325974026874418, -7.311336798694246, -4.769395107147262, 1.7712018973129169, 2.6453558795159933, 4.805405561414102, 3.8260210836580826, 4.394563377865766, -7.139777583888209, -5.2122838464550005, 3.4707574766029285, 2.123455819765237, -1.9216781708796522, -2.696474264481818, 4.3072928655137765, -4.835464310939693, -4.715567254042179, 4.837730825402849, 7.923534727836371, 2.584596049852443, -3.485498284318281, -0.1855356912460123, -6.521909243801553, -5.939170879395363, 1.6440237896855472, 3.580429853920485, 1.3774941555516405, -0.9196574985857207, 3.992221788802156, -1.935957074886787, -3.7997988733436756, 2.714021017101176, 3.0294525494024, -0.3930365150839221, -5.009292419867927, 0.7546979885019088, -2.1174967380732563, -4.788564073800437, -0.952874211429072, 2.781963488012231, 7.9459681649254525, 3.629590909086231, 1.7861643533664724, -4.624868314831825, -3.520074030081029, 0.6087172369876066, -1.1062737995516618, 2.6359835833191347, 3.3113477448178408, 1.689695851822031, -7.095504239394035, -3.5249810225744573, 6.588101250291994, 4.085385013734247, 1.4832832692866167, 0.9734299151513489, 2.6112162346070504, -4.9010306775769275, -4.901239447552297, 3.8950095820646213, 1.3406292538018294, -0.5282837546993199, 0.753952323998906, 5.169079271848939, -1.6201860123291287, -3.762162418130858, 4.275051800352548, 1.1232101108209884, -2.346202168861502, -2.2826782569255832, 4.505890767755019, -1.8190665385734426, -7.658329819004368, 0.10987851344719599, 2.261897124488089, 1.5392501079425294, -1.5844040229997323, 3.683259856560565, 1.1829387289652118, -4.237938986869985, -1.8026666795474242, -1.7946250217271775, 1.2933788146545167, -1.0374578898470987, 0.3434342927014619, -1.7379029412348141, -2.7776369281015727, 1.6505959323578854, -0.9845160970786264, 2.73600663050934, 1.912088176390693, 1.5064291044360094, -4.185993981551647, -5.078603135175087, 1.580810068159558, -1.827676946096123, 1.630245976939615, 0.8767970006001486, 1.9191902082102914, -4.255257065345466, -2.2920106230775206, 6.452410589766132, 1.490241277720836, 0.29745142343220105, -2.022758354575668, 1.166445503203743, -5.432583501507066, -3.2137062880925975, 5.964950921188283, 2.647583725388037, -0.5602763181540208, -2.4389785201658296, 4.185419628541755, -0.47753222108873544, -1.7125950465824034, 2.7545209513686912, 0.342690874987082, -1.1297987908423226, -2.044073816608947, 3.2448098431419643, -1.3658253244506777, -3.0147128514534787, 1.4737794066696859, 2.1205697423773944, 3.048455865920152, 1.3545170452380675, 2.10067165714669, -3.793222879838366, -4.987808008915057, 0.5774951459683085, 1.8218849657934184, 3.307490703034141, 1.9780212087919815, 1.017752319426859, -3.987452442233335, -3.233130377966213, 3.114846702211204, 1.700873563247121, 1.4460065798295234, 1.2663580286331433, 0.587237520548546, -5.66622533912726, -6.23057647388975, 2.149189925604655, 2.414391072519844, 2.2696711137534034, 2.1172763760340816, 1.9510409807356304, -3.7813490807218475, -5.93040981385517, 1.4416758657765874, 1.2545519774333878, 1.9386196629746348, 1.2256240869780195, 0.43855005180747997, -3.8095930731545153, -4.143332375838307, 3.7552822457456716, 1.3420669360588162, 1.3597044716717293, 1.33624713051985, 2.379028078399799, -2.399594563350425, -4.541685080886469, 1.4735219894517995, -1.4039630434543284, 0.66682849702559, 2.245620397491483, 4.4033376300176785, -1.0964722703121415, -4.076951580495554, 1.291026487846525, -1.0855622042189113, 2.021799003901511, 3.3846547403984886, 3.6166899070710716, -4.452017017323524, -5.947523546504931, 2.727763154202456, 1.5435457414281373, 1.9410359497085359, 1.1448645515499383, 2.511180115230774, -3.4770991100961717, -4.9485301299371995, 2.936927151150847, 1.2839461841529514, -0.07087615474581482, -2.8211352474973266, -0.22166213045039357, -1.6727202982802507, -2.4620856585437685, 2.9576801111982873, -1.0473392914946011, 0.18190414813076905, -1.2105705320239823, 0.9254667533052756, -1.9219262740652698, -3.1279930246559458, 1.895358870235939, -2.3953798117147485, 0.6985464824740497, -0.7965075353915932, 1.9591944951226141, -2.517142588893044, -1.676673849478618, 2.84604398527469, -1.0894803442635672, 1.4783835295440593, -0.0555177047161724, 1.6056866092191024, -4.77750003577151, -1.544584491913734, 3.107665546950072, -0.006254614425190333, 0.22423083652751807, 1.7005777930167314, 4.063346193604185, -3.6485240780514268, -2.3458028072183454, 2.520911245490198, 1.9419752563460058, 0.3787820835300505, 2.0498064393412077, 3.147056229388338, -4.013310003373203, -4.362542880984366, 1.5791861684970858, 4.335046506280617, 3.024238669831118, 2.6683220831269496, 0.31176370249733526, -5.618366178783114, -5.267583273223407, 1.7326529550861156, 4.382096829624017, 2.0455805167051846, 1.0366320874674322, -1.9173315713803762, -4.0357688468957615, -2.8014441825258554, 2.565601196897678, 2.2515635294504164, 0.38499715498107095, 2.2908723588858173, -0.8979113432652993, -2.4615590649006482, -4.036513305578183, 1.1582182715691045, -0.8529716577272682, 1.0479873029226106, 6.53669448953843, 2.0355434398513306, -2.617144384775319, -8.06489443517519, 1.3207354479678062, -0.8506484224720485, 3.5640871478901244, 4.775311871095769, 1.147305365017473, -4.285285935325304, -7.629241354464176, 4.545770169990173, -0.027498715864489598, 4.46282927184866, 1.4592336747437304, 3.3706422928017794, -3.3689345604601484, -4.565641812297128, 3.1252066027077245, -2.4158556661884467, 2.37788801634457, 1.1416295182577751, 7.36482011925607, -2.76449323711107, -4.7092182478193525, -1.07666105787791, -1.0243511859113654, 4.565943078068873, 2.2147757486814985, 3.865577422352626, -7.230372724333538, -3.8398148291606216, 0.9939239884760955, 1.4333479713928579, 2.5779115150194096, 0.8221690914339992, 2.4098682578136152, -6.629744793568462, -1.5110744533130211, 2.156484249766369, 1.523300217711056, -1.1811913084632777, 0.4090603117126238, 3.3805594527547562, -4.322160577084377, -1.9905151201929976, -0.870049065627476, 1.5971712874405182, -0.28835577991007966, 3.543064453803165, 2.4565420116791223, -6.427468387594416, -5.704961705197211, -2.3371090366067686, 3.566469474778068, 1.9648586802921713, 6.430843208262392, -2.0029942672566983, -8.421587742473228, -6.414213371184237, 2.6665441248117823, 7.065714062786268, 1.4156562691223726, 3.688760151381901, -6.627341388200704, -3.2205813402830885, -1.8073613922283878, 5.3895839529841405, -1.2095943247915644, -1.1067957899354275, 7.444579017378822, 1.610983987241812, -0.5864043533909686, -6.676970411917687, 2.521362831867282, -5.1450562775781705, 5.7475117965483085, 8.371715838155241, 1.9229179356836248, -7.628853886388433]

#convert data to prophet format input
salesData = [np.float64(item) for item in salesData]
test_date = datetime.datetime.strptime("2022-01-01", "%Y-%m-%d")
#generating dates list
listOfDates = pd.date_range(test_date, periods=364)
#compile series for Prophet input
ddf = pd.Series(salesData,index=listOfDates[:365])

#Prophet begin
forecaster = Prophet(seasonality_mode='multiplicative',
                     weekly_seasonality=True
                     )
forecaster.fit(ddf)
y_pred = forecaster.predict(fh=range(0,50))
plt.plot(range(0,144),salesData[:144], label='Original')
plt.plot(range(144,194),y_pred, label='Prophet forecast')
plt.legend(loc='best')
plt.show()
2

There are 2 best solutions below

8
geometricfreedom On

Welcome Andrey to Stackoverflow. I'll give you a bit detailed answer.

  1. Your time series data does not look preprocessed for the forecasting task. Why do you set the data manually? Try to clean it up a bit and have an extra look at the first datapoints is an anomaly starting from -22. Try to normalize or rescale the data.

  2. Prophet is not a forecasting model you want to use, for several reasons. But the non-theoretical short answer is: Prophet is bad if you don't know the underlying mechanism of it. Try ARIMA models instead like AR, ARMA, ARIMA, SARIMA or ETS first.

  3. Your setting is multiplicative but your time series doesn't show a multiplicative pattern. I would say it is additive since it does not increase over time. But also that you don't seem to know anything about the seasonality. It looks like a random noisy time series to me. Run it without any seasonality.

In the end. No model will help you if the data is not suitable.

EDIT: Adding to your approach without assuming any train/test split.

  • Normalizing
  • Training an ARIMA(1,0,1) and SARIMA(1,0,1)(1,0,1) s=7
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA

import matplotlib.pyplot as plt
import seaborn as sns

tslist = [8.5, 28.0, 29.0, 40.0, 38.5, 43.0, 31.5, 29.0, 41.0, 35.5, 39.5, 31.5, 41.5, 30.0, 30.5, 36.0, 39.5, 44.0, 37.5, 42.5, 30.0, 31.0, 41.0, 38.0, 40.5, 35.0, 41.5, 36.0, 36.5, 35.0, 44.0, 42.5, 40.0, 32.5, 26.5, 30.0, 34.5, 42.5, 40.5, 43.5, 39.5, 31.0, 30.0, 38.5, 40.0, 39.0, 33.0, 44.0, 29.0, 32.0, 35.0, 46.0, 31.5, 30.5, 32.5, 29.0, 29.5, 35.5, 39.5, 40.5, 35.5, 36.5, 28.0, 28.5, 38.5, 36.5, 26.5, 25.5, 31.5, 30.0, 30.0, 35.0, 39.0, 51.5, 38.0, 42.5, 29.5, 28.0, 41.0, 35.0, 35.0, 39.5, 37.5, 24.5, 31.0, 41.0, 38.5, 28.0, 36.0, 40.5, 31.5, 30.0, 42.5, 36.0, 32.5, 37.5, 36.0, 30.0, 30.0, 36.5, 28.0, 31.0, 26.5, 38.5, 29.13, 15.5, 29.0, 38.5, 30.0, 30.0, 35.5, 31.5, 30.0, 30.0, 30.0, 30.0, 29.0, 30.0, 30.0, 29.5, 30.0, 30.0, 36.0, 36.0, 31.5, 30.0, 24.0, 30.0, 27.0, 34.0, 31.0, 32.0, 29.0, 31.0, 37.5, 34.5, 34.5, 25.0, 35.0, 22.0, 26.0, 37.5, 34.0, 25.0, 25.0, 34.0, 31.0, 31.0, 35.5, 27.0, 29.0, 27.5, 34.0, 30.0, 28.0, 28.5, 30.0, 36.0, 31.0, 35.0, 27.5, 26.5, 33.5, 31.0, 37.0, 36.0, 28.0, 29.0, 30.0, 29.0, 30.5, 29.0, 30.0, 29.0, 25.0, 17.0, 30.0, 35.0, 30.5, 30.0, 30.0, 24.0, 21.5, 31.0, 26.0, 27.0, 32.0, 30.0, 25.0, 21.0, 33.5, 30.0, 29.5, 30.5, 29.0, 21.5, 24.0, 28.5, 23.0, 31.0, 32.5, 30.0, 29.5, 24.0, 30.0, 30.0, 29.0, 30.0, 34.0, 23.0, 24.5, 34.0, 29.0, 30.5, 27.5, 30.5, 23.0, 25.0, 29.5, 31.5, 29.0, 19.0, 25.5, 27.5, 23.5, 32.0, 21.5, 28.5, 27.5, 19.5, 27.0, 22.0, 31.0, 24.5, 31.5, 25.0, 32.0, 28.0, 30.0, 30.0, 27.0, 32.0, 28.5, 32.0, 26.5, 27.5, 30.0, 29.5, 30.0, 33.5, 37.0, 29.5, 29.0, 35.5, 36.0, 30.5, 35.5, 35.0, 28.5, 32.0, 34.0, 37.0, 31.0, 33.0, 34.5, 25.5, 28.0, 34.0, 36.0, 36.0, 36.0, 28.54, 22.5, 26.5, 37.0, 35.0, 35.5, 37.0, 35.5, 31.5, 31.5, 36.5, 36.0, 32.0, 51.33, 36.5, 32.0, 30.0, 36.0, 36.5, 37.0, 41.5, 34.5, 29.5, 28.0, 40.0, 39.5, 44.5, 35.5, 45.0, 29.5, 35.5, 41.5, 32.5, 34.5, 37.0, 51.5, 32.5, 27.0, 25.5, 40.5, 43.5, 37.5, 40.5, 28.0, 34.0, 41.5, 38.0, 38.5, 33.0, 41.0, 30.0, 33.5, 43.0, 40.5, 39.5, 42.5, 40.0, 30.5, 32.0, 33.0, 41.5, 34.5, 32.0, 42.0, 27.5, 24.0, 35.5, 34.5, 36.0, 40.5, 30.1, 21.0, 23.5, 37.0, 38.0, 36.5, 39.5, 20.5, 29.5, 33.5, 35.0, 26.0, 28.5, 39.5, 35.0, 31.0, 29.5, 32.5, 21.0, 36.0, 36.0, 30.5, 28.5, 32.5]

ts = pd.DataFrame(tslist)
ts.set_index(pd.date_range(start='1/1/2023', end='31/12/2023', freq='D'), inplace=True)
ts["normalized"] = (ts[0] - ts[0].mean()) / ts[0].std()
mod1 = ARIMA(ts["normalized"], order=(7,0,7)).fit()
mod2 = ARIMA(ts["normalized"], order=(1,0,1), seasonal_order=(1,0,1,7)).fit()

forecast_30 = pd.Series(mod1.predict('2024-01-01', '2024-01-30', dynamic=False))
forecast_30_sarima = pd.Series(mod2.predict('2024-01-01', '2024-01-30', dynamic=False))
ts_new = pd.DataFrame( pd.concat([ts["normalized"], forecast_30]) )
ts_new["flag"] = np.where(ts_new.index < '2024-01-01', 'truth', 'forecast')

ts_new_sarima = pd.DataFrame( forecast_30_sarima )
ts_new_sarima["flag"] = 'forecast_sarima'

Which gives below forecast plot zoomed in from October.

plt.figure(figsize=(12, 8))
fig = sns.lineplot(x=ts_new.index, y=ts_new[0], hue=ts_new["flag"], alpha=.7)
fig.set_xlim([ts_new.index[270], ts_new.index[394]])


fig = sns.lineplot(x=ts_new_sarima.index, y=ts_new_sarima["predicted_mean"], hue=ts_new_sarima["flag"], palette=["g"], alpha=.7)
fig.set_title("ARIMA(7,0,7) & SARIMA(1,0,1)(1,0,1) s=7 with normalized data")

See forecast plot here

0
Andrey Gulyaev On

I understood your point, but not quite like that. I provided the data so everyone can try it. Initially, this is the following time series:

tslist = [8.5, 28.0, 29.0, 40.0, 38.5, 43.0, 31.5, 29.0, 41.0, 35.5, 39.5, 31.5, 41.5, 30.0, 30.5, 36.0, 39.5, 44.0, 37.5, 42.5, 30.0, 31.0, 41.0, 38.0, 40.5, 35.0, 41.5, 36.0, 36.5, 35.0, 44.0, 42.5, 40.0, 32.5, 26.5, 30.0, 34.5, 42.5, 40.5, 43.5, 39.5, 31.0, 30.0, 38.5, 40.0, 39.0, 33.0, 44.0, 29.0, 32.0, 35.0, 46.0, 31.5, 30.5, 32.5, 29.0, 29.5, 35.5, 39.5, 40.5, 35.5, 36.5, 28.0, 28.5, 38.5, 36.5, 26.5, 25.5, 31.5, 30.0, 30.0, 35.0, 39.0, 51.5, 38.0, 42.5, 29.5, 28.0, 41.0, 35.0, 35.0, 39.5, 37.5, 24.5, 31.0, 41.0, 38.5, 28.0, 36.0, 40.5, 31.5, 30.0, 42.5, 36.0, 32.5, 37.5, 36.0, 30.0, 30.0, 36.5, 28.0, 31.0, 26.5, 38.5, 29.13, 15.5, 29.0, 38.5, 30.0, 30.0, 35.5, 31.5, 30.0, 30.0, 30.0, 30.0, 29.0, 30.0, 30.0, 29.5, 30.0, 30.0, 36.0, 36.0, 31.5, 30.0, 24.0, 30.0, 27.0, 34.0, 31.0, 32.0, 29.0, 31.0, 37.5, 34.5, 34.5, 25.0, 35.0, 22.0, 26.0, 37.5, 34.0, 25.0, 25.0, 34.0, 31.0, 31.0, 35.5, 27.0, 29.0, 27.5, 34.0, 30.0, 28.0, 28.5, 30.0, 36.0, 31.0, 35.0, 27.5, 26.5, 33.5, 31.0, 37.0, 36.0, 28.0, 29.0, 30.0, 29.0, 30.5, 29.0, 30.0, 29.0, 25.0, 17.0, 30.0, 35.0, 30.5, 30.0, 30.0, 24.0, 21.5, 31.0, 26.0, 27.0, 32.0, 30.0, 25.0, 21.0, 33.5, 30.0, 29.5, 30.5, 29.0, 21.5, 24.0, 28.5, 23.0, 31.0, 32.5, 30.0, 29.5, 24.0, 30.0, 30.0, 29.0, 30.0, 34.0, 23.0, 24.5, 34.0, 29.0, 30.5, 27.5, 30.5, 23.0, 25.0, 29.5, 31.5, 29.0, 19.0, 25.5, 27.5, 23.5, 32.0, 21.5, 28.5, 27.5, 19.5, 27.0, 22.0, 31.0, 24.5, 31.5, 25.0, 32.0, 28.0, 30.0, 30.0, 27.0, 32.0, 28.5, 32.0, 26.5, 27.5, 30.0, 29.5, 30.0, 33.5, 37.0, 29.5, 29.0, 35.5, 36.0, 30.5, 35.5, 35.0, 28.5, 32.0, 34.0, 37.0, 31.0, 33.0, 34.5, 25.5, 28.0, 34.0, 36.0, 36.0, 36.0, 28.54, 22.5, 26.5, 37.0, 35.0, 35.5, 37.0, 35.5, 31.5, 31.5, 36.5, 36.0, 32.0, 51.33, 36.5, 32.0, 30.0, 36.0, 36.5, 37.0, 41.5, 34.5, 29.5, 28.0, 40.0, 39.5, 44.5, 35.5, 45.0, 29.5, 35.5, 41.5, 32.5, 34.5, 37.0, 51.5, 32.5, 27.0, 25.5, 40.5, 43.5, 37.5, 40.5, 28.0, 34.0, 41.5, 38.0, 38.5, 33.0, 41.0, 30.0, 33.5, 43.0, 40.5, 39.5, 42.5, 40.0, 30.5, 32.0, 33.0, 41.5, 34.5, 32.0, 42.0, 27.5, 24.0, 35.5, 34.5, 36.0, 40.5, 30.1, 21.0, 23.5, 37.0, 38.0, 36.5, 39.5, 20.5, 29.5, 33.5, 35.0, 26.0, 28.5, 39.5, 35.0, 31.0, 29.5, 32.5, 21.0, 36.0, 36.0, 30.5, 28.5, 32.5]'

These are bread sales in a grocery store starting from 01/01/2023 and further by day. Accordingly, on January 1, after the New Year, no one buys bread. The second point is that this original time series is decomposed into additive components using singular spectral analysis. And the periodicity component is fed to the Prophet’s input to make a forecast.

Accordingly, this outlier at the beginning is nothing more than the minimum sales for the entire period.

The series is stationary according to DF and according to the KPSS

P.S. And look i try to make forecast for the original time series - NO RESULT... (attach jpeg)Original TS forecast by Prophet