At first I thought the problem is with my data and that I made a mistake while cleaning the data. However I checked it and that is not the case.
I am using this code:
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
allWords = ' '.join([twts for twts in df['full_text']])
wordCloud = WordCloud(collocations=True, width = 1000,
height=600, random_state = 21, max_font_size = 120).generate(allWords)
plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis('off')
plt.show()
Now my wordcloud shows words like "coronaviru", "viru", "crisi".With collocations=True
it shows the full words in combination with other words like "coronavirus case" "coronavirus pandemic".
Does anyone know how to fix this?
Like I said, I checked the data and it is always the correct full word there. So I guess the mistake happens with the wordcloud.
My data looks like this:
created_at id full_text
0 Sat Aug 01 00:25:53 +0000 2020 28934685093219 life is hard with coronavirus
1 Sat Aug 01 00:25:53 +0000 2020 28934685093219 coronavirus sucks
You would need to change a parameter in the WordCloud function: normalize_plurals=False. Reference: https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html