I have a dataset with 4000 rows, where I have the duplicate rows(e.g. 2, 3, 4 times). I want to find the cumsum of the duplicates over time.
I have used this code to assign the number of duplicity. But it has rearranged the position of ID
df = duplicate_df.value_counts(sort=False, dropna=False).reset_index(name="Duplicity")
Output
ID Time Duplicity
12345 2020 2
12345 2020 2
34567 2021 1
34696 2020 3
34696 2020 3
34696 2020 3
whereas I want to add the duplicity and the ID remains same position.
ID Time Duplicity
34696 2020 3
12345 2020 2
12345 2020 2
34696 2020 3
34696 2020 3
34567 2021 1
How to find cumsum of duplicity over time? Thank you.
Input data:
d = {'ID': [34696, 12345, 12345, 34696, 34696, 34567],
'Time': [2020, 2020, 2020, 2020, 2020, 2021]}
Use
groupby
andtransform
: