Python PANDAS groupby-resample seems to act differently depending on the following aggregate functions. The aggregate max() keeps the groupby key but the aggregate sum() does not. Could anyone help me to understand this difference?
import pandas as pd
import datetime
ts_col = [datetime.datetime.strptime('2024-01-01 01:00:00','%Y-%m-%d %H:%M:%S') + datetime.timedelta(hours=i) for i in range(4)]
sdf = pd.DataFrame({'A': ['a','a','b','b'], 'B':[3,2,1,0], 'T': ts_col}, index=[0,1,2,3])
sdf
The input DataFrame is then:
A B T
0 a 3 2024-01-01 01:00:00
1 a 2 2024-01-01 02:00:00
2 b 1 2024-01-01 03:00:00
3 b 0 2024-01-01 04:00:00
Applying the sum() aggregate function:
sdf.set_index('T').groupby('A').resample('H').sum()
shows
B
A T
a 2024-01-01 01:00:00 3
2024-01-01 02:00:00 2
b 2024-01-01 03:00:00 1
2024-01-01 04:00:00 0
but, applying the max() aggregate function:
sdf.set_index('T').groupby('A').resample('H').max()
shows
A B
A T
a 2024-01-01 01:00:00 a 3
2024-01-01 02:00:00 a 2
b 2024-01-01 03:00:00 b 1
2024-01-01 04:00:00 b 0
The only difference is the aggregate function. But, the latter kept the groupby key as column but the former did not.