Check if string starts with a list of values & doesn't contain a certain value

322 Views Asked by At

I have a dataframe:

df = pd.DataFrame([['Jim', 'CF'], ['Toby', 'RW'], ['Joe', 'RWF'], ['Tom', 'RAMF'], ['John', 'RWB']], columns=['Name', 'Position'])

I want to obtain a subset of this dataframe such that we only have subjects who:

  • Has Position = 'RW', 'RWF', or 'RAMF'

I need to do this in one line of code I can currently do this in two lines:

RW = df[df['Position'].str.startswith(('RW', 'RAMF', 'RWF'), na = False)]
RW = RW[RW['Position'].str.contains('RWB')==False]

The issue is that subjects with position 'RWB' show up when subsetting by str.startswith('RW'). Therefore, I have to specify in the second line to remove these 'RWB'.

Is it possible to do this in one line of code??

1

There are 1 best solutions below

0
jezrael On

If need test starting of strings use:

RW = df[df['Position'].str.match('RW|RAMF|RWF', na = False) & 
        ~df['Position'].str.contains('RWB', na = False)]
print (RW)
   Name Position
1  Toby       RW
2   Joe      RWF
3   Tom     RAMF

Or:

RW = df[df['Position'].str.startswith(('RW', 'RAMF', 'RWF'), na = False) & 
        ~df['Position'].str.contains('RWB', na = False)]
print (RW)
   Name Position
1  Toby       RW
2   Joe      RWF
3   Tom     RAMF

If need test if exist values of tuple in column:

RW = df[df['Position'].isin(('RW', 'RAMF', 'RWF'))]

print (RW)
   Name Position
1  Toby       RW
2   Joe      RWF
3   Tom     RAMF