Deleting rows not uniquely identified by the index in Pandas

Question

Deleting rows not uniquely identified by the index in Pandas

137 Views Asked by AllanLRH At 17 August 2025 at 10:09

How do I delete (drop) rows from a Pandas DataFrame, in which the rows are not uniquely identified by the index?
To clarify what I mean by uniquely identified: When querying into the index, multiple values are returned.

I can make a mask which, when used as an argument to df.loc, correctly returns the targeted rows, but I can't make it work with df.drop.

Example code:

The output of the print functions is included as a comment.

from __future__ import print_function
import numpy as np
import pandas as pd


np.random.seed(0)
df = pd.DataFrame(
    data=np.random.randint(0, 10, size=(7, 3)),
    index=[['u01', 'u01', 'u01', 'u02', 'u02', 'u03', 'u03'], ['C', 'C', 'C', 'C', 'T', 'T', 'T']],
    columns=['foo', 'bar', 'baz'])
df.index.names = ['user', 'comType']

print(df)
#                  foo  bar  baz
#    user comType
#    u01  C          5    0    3
#         C          3    7    9
#         C          3    5    2
#    u02  C          4    7    6
#         T          8    8    1
#    u03  T          6    7    7
#         T          8    1    5

mask = (df.baz > 8) | (df.baz < 2)
print(mask)
#    user  comType
#    u01   C          False
#          C           True
#          C          False
#    u02   C          False
#          T           True
#    u03   T          False
#          T          False
#    Name: baz, dtype: bool


print(df.loc[mask])
#                  foo  bar  baz
#    user comType
#    u01  C          3    7    9
#    u02  T          8    8    1

df2 = df.drop(mask.index[mask.values])  # Drops all rows of user user u01, one row of user u02
print(df2)
#                  foo  bar  baz
#    user comType
#    u02  C          4    7    6
#    u03  T          6    7    7
#         T          8    1    5

df3 = df.drop(mask)  # Doesn't do anything
print(df3)
#                  foo  bar  baz
#    user comType
#    u01  C          5    0    3
#         C          3    7    9
#         C          3    5    2
#    u02  C          4    7    6
#         T          8    8    1
#    u03  T          6    7    7
#         T          8    1    5

A quick tip: If you want to run the code on your own machine, I suggest replace the print-functions with these lines, as the output otherwise is hard to discern form one another:

print('df', df, sep=" =\n\n", end="\n"*5)
print('mask', mask, sep=" =\n\n", end="\n"*5)
print('df.loc[mask]', df.loc[mask], sep=" =\n\n", end="\n"*5)
print('df2', df2, sep=" =\n\n", end="\n"*5)
print('df3', df3, sep=" =\n\n", end="\n"*5)

Original Q&A

There are 2 best solutions below

jezrael On 21 December 2016 at 13:54

Use inverting boolean mask by ~ with boolean indexing:

mask = (df.baz > 8) | (df.baz < 2)
print (mask)
user  comType
u01   C          False
      C           True
      C          False
u02   C          False
      T           True
u03   T          False
      T          False
Name: baz, dtype: bool

print(~mask)
user  comType
u01   C           True
      C          False
      C           True
u02   C           True
      T          False
u03   T           True
      T           True
Name: baz, dtype: bool

print (df[~mask])
              foo  bar  baz
user comType               
u01  C          5    0    3
     C          3    5    2
u02  C          4    7    6
u03  T          6    7    7
     T          8    1    5

**EdChum** · Accepted Answer

As you already had a boolean mask, you could've just used this rather than drop by inverting using unary not ~:

In [199]:
mask

Out[199]:
u01  C    False
     C     True
     C    False
u02  C     True
     T    False
u03  T     True
     T    False
Name: baz, dtype: bool

In [198]:    
df[~mask]

Out[198]:
       foo  bar  baz
u01 C    0    5    8
    C    5    2    7
u02 T    6    7    3
u03 T    6    5    5

drop takes label values for dropping, what you passed was a full boolean mask so nothing was dropped as all labels were present, besides what you wanted was the index values but even then it would not have worked:

In [212]:
df.drop(mask[~mask].index)

Out[212]:
       foo  bar  baz
u02 C    3    7    1

because your multi-index labels are repeated, at the first level hence why you end up with a single row, the mask was all you needed

Deleting rows not uniquely identified by the index in Pandas

Example code:

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in INDEXING

Related Questions in MULTI-INDEX

Trending Questions

Popular # Hahtags

Popular Questions