Drop Rows in Pandas If Column Contains Another String

45 Views Asked by At

I have two columns, both strings, and I'd like to create a function that removes rows in cases where the full_name contains the part_name

df = pd.DataFrame({'city_code':['34', '36', '89', '34'], 
                   'full_name': ['WXYZ(24)', 'ZYXW', 'YZWX','WXYZ(24)'], 
                   'part_name': ['WXYZ', 'ABCD', 'YZWX', 'ABCD']})
print(df)

city_code full_name   part_name
34         WXYZ(24)      WXYZ
36         ZYXW          ABCD
89         YZWX          YZWX
34         WXYZ(24)      ABCD

The ouput I want is:

city_code  full_name   part_name
36          ZYXW       ABCD
34          WXYZ(24)   ABCD

Because this resulting line is the only one where part_name is not contained within full_name. I've tried the below and received the following error:

df = df[~df['full_name'].str.contains(df['part_name'])]
TypeError: unhashable type: 'Series'

I've seen similar entries on this matter, but the resolution for those was to use a dictionary, which isn't suitable for this case as far as I can tell because I need to remove these rows based on their relative values.

Please let me know if I can provide any further detail.

2

There are 2 best solutions below

0
Panda Kim On

Code

Although vectorized operations might be possible, here is a non-vectorized solution that should work for now.

cond = df.apply(lambda x: x['part_name'] not in x['full_name'], axis=1)
out = df[cond]

out

  city_code full_name part_name
1        36      ZYXW      ABCD
3        34  WXYZ(24)      ABCD

Example Code

your example code has many typo

import pandas as pd

df = pd.DataFrame(
    {
        "city_code": ["34", "36", "89", "34"],
        "full_name": ["WXYZ(24)", "ZYXW", "YZWX", "WXYZ(24)"],
        "part_name": ["WXYZ", "ABCD", "YZWX", "ABCD"],
    }
)
0
mozway On

This code cannot be vectorized since the comparison is performed for each row individually

You can use a list comprehension with zip and boolean indexing:

out = df[[part not in full for full, part in 
          zip(df['full_name'], df['part_name'])]]

Output:

  city_code full_name part_name
1        36      ZYXW      ABCD
3        34  WXYZ(24)      ABCD