How to remove the same and rare words in dataframe pandas?

Question

How to remove the same and rare words in dataframe pandas?

351 Views Asked by AudioBubble At 20 September 2018 at 05:50

How do I remove the same words in the dataframe named df3? My below codes doesn't seemed to work...

 df3 = pd.DataFrame(np.array(c3), columns=["content"]).drop_duplicates()

 def text_processing_cat3(df3):
''=== Removal of common words ==='''
    freq = pd.Series(' '.join(df3['content']).split()).value_counts()[:10]
    freq = list(freq.index)
    df3['content'] = df3['content'].apply(lambda x: " ".join(x for x in 
    x.split() if x not in freq))

'''=== Removal of rare words ==='''
freq = pd.Series(' '.join(df3['content']).split()).value_counts()[-10:]
freq = list(freq.index)
df3['content'] = df3['content'].apply(lambda x: " ".join(x for x in 
x.split() if x not in freq))


 return df3

  print(text_processing_cat3(df3)

The sample output for the above is:

 cat_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           content
0      3  male malay man nkda walking stick home ambulant ws void deck able walk bendemeer mall home bus stop away adli stays daughter family husband none image image image order cancellation note ct brain duplicate image
1      3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yo chinese man nkda phx hypertension hyperlipidemia benign hyperplasia open cholecystectomy gallbladder empyema distal gastrectomy pud penetrating aortic

Please help check the codes and improve the codes above. Thank you!!

Original Q&A

How to remove the same and rare words in dataframe pandas?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in TEXT-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions