I have a list of words and phrases which I want to use to specify which rows to remove when creating a new dataframe.
include = ['word1', 'word2', 'word3'...]
exclude = ['word4', 'word5 word6' ...]
So far I have basically tried:
new_df = []
for word in include:
valid = df['Message'].str.contains(word)
count = 0
for item in valid:
if item:
temp.append(df.iloc[count])
count += 1
Then I remove the extras using temp = pd.DataFrame(temp) and temp = temp.drop_duplicates.
This gives me the included dataframe but then I want to further specify that if they contain any word from excluded that the row should be dropped. I am not very skilled with Pandas and have tried the same concept just with ~ for valid and .drop() instead of .append() but the rows still remain, how could I go about this?
I assume the words will be embedded in a bigger text, if not, you should consider the Pandas method
df['Message'].isin(list_of_words)For the first case you can either do it with regex patterns or without regex patterns and use
reduceto combine all the masks (lists of boolean values)Example without regex pattern
Example with regex pattern