Deleting Chained Duplicates

84 Views Asked by At

Lets say I have a list:

lits = [1, 1, 1, 2, 0, 0, 0, 0, 3, 3, 1, 4, 5, 2, 2, 2, 0, 0, 0]

and i need this to become [1, 1, 2, 0, 0, 3, 3, 1, 4, 5, 2, 2, 0, 0] (Delete duplicates, but only in a chain of duplicates. Going to do this on a huge HDF5 file, with pandas, numpy. Would rather not use a for loop iterating through all elements.

table = table.drop_duplicates(cols='[SPEED OVER GROUND  [kts]]', take_last=True)

Is there a modification I can do to this code?

1

There are 1 best solutions below

0
On

In pandas you can do a boolean mask, selecting a row only if it is differs from either the preceding or succeeding value:

>>> df=pd.DataFrame({ 'lits':lits })

>>> df[ (df.lits != df.lits.shift(1)) | (df.lits != df.lits.shift(-1)) ]

    lits
0      1
2      1
3      2
4      0
7      0
8      3
9      3
10     1
11     4
12     5
13     2
15     2
16     0
18     0