I have a data frame that looks like this:
date | week | id |
---|---|---|
20/07/21 12:46:00 | 1 | d1 |
20/07/21 12:56:00 | 1 | d1 |
20/07/21 13:09:00 | 1 | d1 |
20/07/21 14:11:00 | 1 | d1 |
20/07/21 14:42:00 | 1 | d1 |
I want to group by date in in 30 minutes interval- so if 2 consecutive rows are more than 30 minutes apart they are on different groups. The output I need looks like this:
week | id | min_date | max_date |
---|---|---|---|
1 | d1 | 20/07/21 12:46:00 | 20/07/21 13:09:00 |
1 | d1 | 20/07/21 14:11:00 | 20/07/21 14:11:00 |
1 | d1 | 20/07/21 14:42:00 | 20/07/21 14:42:00 |
I used this code in order to group by:
x=df.groupby(['id','week', pd.Grouper(key='date', freq='30min',origin="start")]).agg({'date':[np.min, np.max]})
Something isn't working with the grouper, any suggestions how to improve it?
EDIT:
Here's an example of my data that causes an issue:
date | week | id |
---|---|---|
20/07/21 12:46:00 | 1 | d1 |
20/07/21 12:56:00 | 1 | d1 |
20/07/21 13:09:00 | 1 | d1 |
22/07/21 07:11:00 | 1 | d1 |
22/07/21 07:14:00 | 1 | d1 |
22/07/21 07:27:00 | 1 | d1 |
22/07/21 08:34:00 | 1 | d1 |
22/07/21 08:36:00 | 1 | d1 |
The output required is:
week | id | min_date | max_date |
---|---|---|---|
1 | d1 | 20/07/21 12:46:00 | 20/07/21 13:09:00 |
1 | d1 | 20/07/21 07:11:00 | 20/07/21 07:27:00 |
1 | d1 | 20/07/21 08:34:00 | 20/07/21 08:36:00 |
This is the output I get:
week | id | min_date | max_date |
---|---|---|---|
1 | d1 | 20/07/21 12:46:00 | 20/07/21 13:09:00 |
1 | d1 | 20/07/21 07:11:00 | 20/07/21 08:36:00 |
I don't understand why it groups the last rows together when there is more than an hour difference between 20/07/21 07:27:00 and 20/07/21 08:34:00.
Thanks!
You can use:
Or maybe also group by id and week:
Output: