How to remove the same and rare words in dataframe pandas?

351 Views Asked by At

How do I remove the same words in the dataframe named df3? My below codes doesn't seemed to work...

 df3 = pd.DataFrame(np.array(c3), columns=["content"]).drop_duplicates()

 def text_processing_cat3(df3):
''=== Removal of common words ==='''
    freq = pd.Series(' '.join(df3['content']).split()).value_counts()[:10]
    freq = list(freq.index)
    df3['content'] = df3['content'].apply(lambda x: " ".join(x for x in 
    x.split() if x not in freq))

'''=== Removal of rare words ==='''
freq = pd.Series(' '.join(df3['content']).split()).value_counts()[-10:]
freq = list(freq.index)
df3['content'] = df3['content'].apply(lambda x: " ".join(x for x in 
x.split() if x not in freq))


 return df3

  print(text_processing_cat3(df3)

The sample output for the above is:

 cat_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           content
0      3  male malay man nkda walking stick home ambulant ws void deck able walk bendemeer mall home bus stop away adli stays daughter family husband none image image image order cancellation note ct brain duplicate image
1      3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yo chinese man nkda phx hypertension hyperlipidemia benign hyperplasia open cholecystectomy gallbladder empyema distal gastrectomy pud penetrating aortic  

Please help check the codes and improve the codes above. Thank you!!

0

There are 0 best solutions below