How do I remove the same words in the dataframe named df3? My below codes doesn't seemed to work...
df3 = pd.DataFrame(np.array(c3), columns=["content"]).drop_duplicates()
def text_processing_cat3(df3):
''=== Removal of common words ==='''
freq = pd.Series(' '.join(df3['content']).split()).value_counts()[:10]
freq = list(freq.index)
df3['content'] = df3['content'].apply(lambda x: " ".join(x for x in
x.split() if x not in freq))
'''=== Removal of rare words ==='''
freq = pd.Series(' '.join(df3['content']).split()).value_counts()[-10:]
freq = list(freq.index)
df3['content'] = df3['content'].apply(lambda x: " ".join(x for x in
x.split() if x not in freq))
return df3
print(text_processing_cat3(df3)
The sample output for the above is:
cat_id content
0 3 male malay man nkda walking stick home ambulant ws void deck able walk bendemeer mall home bus stop away adli stays daughter family husband none image image image order cancellation note ct brain duplicate image
1 3 yo chinese man nkda phx hypertension hyperlipidemia benign hyperplasia open cholecystectomy gallbladder empyema distal gastrectomy pud penetrating aortic
Please help check the codes and improve the codes above. Thank you!!