pandas: how to aggregate records into rolling time windows at a given frequency?

40 Views Asked by At

Here is my data:

times = pd.date_range(start=pd.Timestamp.now(), end=pd.Timestamp.now() + pd.Timedelta(minutes=1),
                      periods=61)
data = np.arange(61)
df = pd.DataFrame({'times': times, 'data': data})

output:

                           times  data
0  2024-03-20 10:38:44.100877000     0
1  2024-03-20 10:38:45.100877416     1
2  2024-03-20 10:38:46.100877833     2
3  2024-03-20 10:38:47.100878250     3
4  2024-03-20 10:38:48.100878666     4
..                           ...   ...
56 2024-03-20 10:39:40.100900333    56
57 2024-03-20 10:39:41.100900750    57
58 2024-03-20 10:39:42.100901166    58
59 2024-03-20 10:39:43.100901583    59
60 2024-03-20 10:39:44.100902000    60

If I want to group this with a rolling window of say 2 seconds I can do this:

df_windows = df.rolling(on='times', window=pd.Timedelta(seconds=2))
for window in df_windows:
    print(window)

Then I get this:

times                           
2024-03-20 10:48:09.273265     0
                               data
times                              
2024-03-20 10:48:09.273265000     0
2024-03-20 10:48:10.273265333     1
                               data
times                              
2024-03-20 10:48:10.273265333     1
2024-03-20 10:48:11.273265666     2
                               data
times                              
2024-03-20 10:48:11.273265666     2
2024-03-20 10:48:12.273266000     3
                               data

Cool. But if I don't want a window computed relative to every single row then pandas seems to be lacking features to do that? E.g. a step parameter was added to rolling (https://github.com/pandas-dev/pandas/issues/15354) but it doesn't work for this case:

df_windows = df.rolling(on='times', window=pd.Timedelta(seconds=2), step=2)

NotImplementedError: step is not supported with frequency windows

It also doesn't make much sense because 2 is not a meaningful step, it should be a pd.Timedelta object, but the step argument has to be an integer.

So, it seems like the rolling function cannot achieve what I want. So, what workaround is there in pandas? I would like one that works with irregular data, i.e. does not rely on my timestamps being at some regular frequency. I can do something with groupby to get time groups, but I don't see a way to get overlapping windows using groupby...

0

There are 0 best solutions below