I am trying to put some kind of placeholder for certain words in my dataset. However, my method doesnt seem to do anything. I do not get an error but it also doesn't do what it is supposed to do. What am I doing wrong here?
CODE:
wordlist_urls =['co','https','http', 'www']
wordlist_news = ['nrc','volkskrant','ad', 'telegraaf', 'dagblad','courant']
wordlist_socials = ['twitter','instagram','linkedin', 'blog', 'twitteraccount']
wordlist_links = ['GroenLinks','sp','bij1', 'pvda', 'pvdd', 'DENK']
wordlist_rechts = ['FvD','VVD','PvdA', 'CDA', 'ja21', 'CU', 'SGP', 'Volt', 'bvnl']
wordlist_uni = ['uva','vu','rug', 'university', 'universiteit', 'Utrecht University', 'Leiden university', 'UU']
written_news['placeholders'] = written_news['user_description_clean'].replace(wordlist_urls,'URL')
written_news.loc['placeholders'] = written_news.loc['placeholders'].replace(wordlist_news,'NEWSPAPERS')
written_news.loc['placeholders'] = written_news.loc['placeholders'].replace(wordlist_socials,'SOCIALS')
written_news.loc['placeholders'] = written_news.loc['placeholders'].replace(wordlist_links,'POL_L')
written_news.loc['placeholders'] = written_news.loc['placeholders'].replace(wordlist_rechts,'POL_R')
written_news.loc['placeholders'] = written_news.loc['placeholders'].replace(wordlist_uni,'UNI')
written_news['placeholders']
I tried using the replace() method, I was expecting the words in the wordlist would show in the data as the newly defined word. However the words are still unchanged in the dataset.
It is hard to provide a solution if you don't tell us how your data are formatted.
Looking at your other question here at StackOverflow, one issue could be because your column called
user_description_clean
is a pandas series of lists (a list of lists). Whereby each row is a tokenized string, stored as a list of words in Python. Or perhaps it is just one string?In any case you, you could consider making a function in which you search for the words via regular expressions. You can then use
.apply()
andlambda: x
to replace the words in each row of your data frame.It would look like this:
the output would look like this:
If you have another list, you just change the input for your arguments in the following way:
which results in:
But again, a minimal reproducible would be helpful as it helps to understand how your data are formatted in the first place.