If I'm creating a Rolling Mean Feature based on my Sales (target) column, is it necessary to shift it?
Let me give an example:
Lets suppose I have days 01~10 in my dataset. If I create a Mean Rolling Window column of 7 Days, in my day 10th day row, It will consider the 7th day as the value of this row to calculate the Rolling Mean, for example. Now If I'm going to predict day 11, which is tomorrow, I would need the Sales value of this day in order to have the Rolling Mean, which makes no Sense.
So, It makes more Sense in my opinion to always get the 7 last Days, not considering the current.
Can anyone help?
I will assume that you can use the Pandas-library, as its powerful rolling function is able to easily accomodate your request.
Consider the following example:
Which results in
AS you can see, this enables you to use the mean of the indices [0,1,2] to be displayed at index 3 ((1+2+3)/3 =2). The NAs at the beginning are there because the window function doesnt know what to do if its window doesnt completely overlap with the series.
We shifted the Series here by before calculating the rolling transformation, something you wanted to avoid.
In your special case (which is that you shift by 1), the window Function can imporoved by the
closedargument:closed "left" means that the last point will mean that the current point should not be part of the calculations of the window. (A window has kind of left and right changed, when we speak of the leftmost point in the window it will be the rightmost point in the subseries the window "sees", this is due to the maths behind it, i would just roll with it :D)
you can find the closed options here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#:~:text=DataFrame%20first%20instead.-,closed,-str%2C%20default%20None