Why my Term Document Matrix has letters missing at end?

138 Views Asked by At

enter image description hereI'm working on creating a word cloud. On creation I see many words having last alphabets missing. For ex., Movie --> movi, become --> becom

I've marked the words in yellow. the last one or two letters are missing

2

There are 2 best solutions below

0
On

missing letters at the end of the words are the result of preprosessing - stemming. Try to avoid stemming prior to creating DTM or TDM, and create a wordcloud without stemming.

0
On

For those who need the answer to this question - We see the last letters in the TDM missing because when we perform stemming on our data, the stem function will look for words that have the same root word. All these words will be then set to their root words. This is the reason we will see "Movie" as "Movi" and so on.