Say I have data of this format in a df
id sta end dur
40433 2020-01-08 05:06:01 2020-01-08 05:08:14 133
40433 2020-09-22 12:01:26 2020-09-22 12:31:34 1808
40433 2020-09-22 12:05:00 2020-09-22 13:05:00 3600
Maybe in the same df or in a new df, I want to add records that look like this:
id sta end h1 dur
40433 2020-01-08 05:06:01 2020-01-08 05:08:14 05 133
40433 2020-09-22 12:01:26 2020-09-22 12:31:34 12 1808
40433 2020-09-22 12:05:00 2020-09-22 13:05:00 12 3300
40433 2020-09-22 12:05:00 2020-09-22 13:05:00 13 300
dur
is in seconds.
I want to groupby
id
, then day
(extracted from sta
), then h1, h2, etc.
, aggregating the dur
for the particular hours (h1, etc.
), for that id
.
revised answer based on your comments. For a quicker turn around, I went to array math with a few conversions after trying a few other methods. There may be a more efficient way, not sure how it performs at scale, but it works. Has the caveat that if your duration goes over 24 hours total, ALL hour column values will be all 60 minutes, so I left that condition alone so that you may deal with it as you see fit: