When does PANDAS groupby key remain after resampling?

56 Views Asked by At

Python PANDAS groupby-resample seems to act differently depending on the following aggregate functions. The aggregate max() keeps the groupby key but the aggregate sum() does not. Could anyone help me to understand this difference?

import pandas as pd
import datetime

ts_col = [datetime.datetime.strptime('2024-01-01 01:00:00','%Y-%m-%d %H:%M:%S') + datetime.timedelta(hours=i) for i in range(4)]
sdf = pd.DataFrame({'A': ['a','a','b','b'], 'B':[3,2,1,0], 'T': ts_col}, index=[0,1,2,3])
sdf

The input DataFrame is then:

    A   B                     T
0   a   3   2024-01-01 01:00:00
1   a   2   2024-01-01 02:00:00
2   b   1   2024-01-01 03:00:00
3   b   0   2024-01-01 04:00:00

Applying the sum() aggregate function:

sdf.set_index('T').groupby('A').resample('H').sum()

shows

                          B
A      T
a  2024-01-01 01:00:00    3
   2024-01-01 02:00:00    2
b  2024-01-01 03:00:00    1
   2024-01-01 04:00:00    0

but, applying the max() aggregate function:

sdf.set_index('T').groupby('A').resample('H').max()

shows

                          A     B
A      T
a  2024-01-01 01:00:00    a     3
   2024-01-01 02:00:00    a     2
b  2024-01-01 03:00:00    b     1
   2024-01-01 04:00:00    b     0

The only difference is the aggregate function. But, the latter kept the groupby key as column but the former did not.

0

There are 0 best solutions below