I have a dataset with group_ids. I want to attach unique ids to the groups when they are consistently present in my dataset, meaning they disappear for maximum 5 seconds. If they disappear more than 5 seconds, they should get a new accumulative id. If they stay present or only disappear for 5 seconds, they should keep the same accumulative number.
Here is my dataset:
group_id dt2 expect_id
0 nan 2023-11-28 17:43:09.900628 1
1 1 2023-11-28 17:43:11.322793 2
2 1 2023-11-28 17:43:12.660818 2
3 1 2023-11-28 17:43:14.119043 2
4 1 2023-11-28 17:43:15.550513 2
5 2 2023-11-28 17:43:15.550513 3
6 3 2023-11-28 17:43:15.550513 4
7 4 2023-11-28 17:43:15.550513 5
8 1 2023-11-28 17:43:16.973557 2
9 2 2023-11-28 17:43:16.973557 3
10 3 2023-11-28 17:43:16.973557 4
11 4 2023-11-28 17:43:16.973557 5
12 1 2023-11-28 17:43:18.335619 2
13 2 2023-11-28 17:43:18.335619 3
14 3 2023-11-28 17:43:18.335619 4
15 4 2023-11-28 17:43:18.335619 5
16 1 2023-11-28 17:43:19.738230 2
17 2 2023-11-28 17:43:19.738230 3
18 3 2023-11-28 17:43:19.738230 4
19 4 2023-11-28 17:43:19.738230 5
20 1 2023-11-28 17:43:21.110693 2
21 2 2023-11-28 17:43:21.110693 3
22 1 2023-11-28 17:43:22.571257 2
23 2 2023-11-28 17:43:22.571257 3
24 1 2023-11-28 17:43:24.000589 2
25 1 2023-11-28 17:43:25.429940 2
26 2 2023-11-28 17:43:25.429940 3
27 1 2023-11-28 17:43:26.851142 2
28 2 2023-11-28 17:43:26.851142 3
29 1 2023-11-28 17:43:28.256274 2
30 nan 2023-11-28 17:43:29.617541 4
31 nan 2023-11-28 17:43:30.974490 4
32 nan 2023-11-28 17:43:32.360739 4
33 1 2023-11-28 17:43:33.730457 5
34 2 2023-11-28 17:43:33.730457 6
35 1 2023-11-28 17:43:35.270380 5
34 2 2023-11-28 17:43:35.270380 6
35 1 2023-11-28 17:43:35.270380 5
34 2 2023-11-28 17:43:36.730457 6
35 1 2023-11-28 17:43:36.270380 5
I tried this cumsum() method to create my "unique_id" column:
df['group_id'] = (df['group_id'].eq(0) | (df['group_id'] != df['group_id'].shift())).cumsum()
This is the right way forward, but it attaches a new accumulative value to my groups, even if they are present based on the datetime column. Is there a way to implement a logic that sets that only attaches a new accumulative value as unique_id if the group_id appears within at least 5 seconds?
Here, for clarity, 10 seconds are added in lines 12 and 20. In the created column
'delta'
the row difference is in seconds.Update: 01.01.2023
it's more like what you want. My telepathic abilities are not so strong. Therefore, it is not clear what is happening with the lines where NaN?