Tokenization of the data
tidy_text <- data %>%
unnest_tokens(word, q_content)
Removal of stop words
data("stop_words")
stop_words
tidy_text <- tidy_text %>% anti_join(stop_words, by ="word")
tidy_text %>% count(word, sort = TRUE)
Output including most important 10 words
1 im 13012
2 dont 11197
3 feel 9168
4 time 6697
5 life 4464
6 ive 4403
7 people 4233
8 told 4150
9 friends 4045
10 love 3281
As explained by @Maurits Evers, the words in your data and
stop_words
do not exactly match. You may remove'
from the words instop_words
before joining them. Try :