I have a dataframe with a timestamp column/index and I am calculating the moving average over the last 5 seconds.
df['Price'].rolling(window=time_diff, min_periods=1, closed='both').mean() So far so good.
Now I also need to calculate the moving average for the next 5 seconds. However, my timestamps are not evenly spaced such that I can't just shift the dataframe to recalculate the second average.
The data looks like this:
Timestamp Price Start Stop
0, 2019-01-02 08:30:00, 56.565, 0, 5
1, 2019-01-02 08:30:01, 56.565, 1, 6
2, 2019-01-02 08:30:02, 56.565, 2, 6
3, 2019-01-02 08:30:03, 56.540, 3, 7
4, 2019-01-02 08:30:04, 56.545, 4, 7
5, 2019-01-02 08:30:05, 56.545, 5, 8
6, 2019-01-02 08:30:07, 56.540, 6, 10
7, 2019-01-02 08:30:09, 56.550, 7, 12
8, 2019-01-02 08:30:10, 56.545, 8, 12
9, 2019-01-02 08:30:11, 56.550, 9, 12
10,2019-01-02 08:30:12, 56.570, 10, 13
For example: At index 5 the average over the last 5 seconds would be 56.5541 And I need to compute the average over the next 5 seconds excluding the current time, i.e. index 6,7,8 (56.545).
Using df.index.get_indexer() I am able to extract the index of the last row to be included in the average,
df['stop'] = df.index.get_indexer(df['Date-Time-Exch'] + time_diff, method='bfill')
I was hoping that I could somehow use the values in 'start' and 'stop' for slicing with iloc like
df.iloc[df['start']:df['stop'], 1].mean()
but this does not work.
Alternatively, I came up with this:
def get_indexes(time_index, offset):
start, end = df.index.get_indexer([time_index, time_index + offset], method='bfill')
avg = df.iloc[start + 1:end + 1, 1].mean()
return avg
which used with .apply() is sadly far too slow to be useful.
Hopefully you can help me because I have been stuck on this problem for some time now.
You can calculate rolling forward by reverting your dataframe, then calculating rolling average, then reverting again. Also you need to specify
closed='left'(see documentation) when doing this, since you don't want to include current value in your forward average:Will output