I am trying to remove duplicate consequtive elements and keep the last value in data frame using pandas

148 Views Asked by At

There are two columns in the data frame and am trying to remove the consecutive element from column "a" and its corresponding element from column "b" while keeping only the last element.

import pandas as pd
a=[5,5,5,6,6,6,7,5,4,1,8,9]
b=[50,40,45,87,88,54,12,75,55,87,46,98]
df = pd.DataFrame(list(zip(a,b)), columns =['Patch', 'Reward'])
df=df.drop_duplicates(subset='Patch', keep="last")
df = df.set_index('Patch')
print (df)

when I run this I get:

       Reward
Patch        
6          54
7          12
5          75
4          55
1          87
8          46
9          98

however, what I want is:

Patch      Reward
5           45
6           54
7           12
5           75
4           55
1           87
8           46
9           98

PS: I don't want the duplicate elements repeating after another element or later in the series to be removed, but remove only consecutive duplicates while keeping the last to appear in the consecutive appearance. I also don't want it to be sorted, they should appear in the same sequence as in the list.

1

There are 1 best solutions below

5
On BEST ANSWER

You can create a new column assigning an id to each group of consecutive elements and then doing the groupby operation followed by last aggregation.

a=[5,5,5,6,6,6,7,5,4,1,8,9]
b=[50,40,45,87,88,54,12,75,55,87,46,98]
df = pd.DataFrame(list(zip(a,b)), columns =['Patch', 'Reward'])
df["group_id"]=(df.Patch != df.Patch.shift()).cumsum()
df = df.groupby("group_id").last()

Output

Patch  Reward 
5      45
6      54
7      12
5      75
4      55
1      87
8      46
9      98