From this process
library(stm)
library(tidyr)
library(quanteda)
testDfm <- gadarian$open.ended.response %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
dfm()
Let's say that we check the frq
dftextstat <- textstat_frequency(testDfm)
and we want to remove some specific words from dfm. Accroding to the dftextstat we want to remove c("and", "to")
Is there any way to make it in the dfm without the need to run again the lines to create the dfm?
If you already have a dfm, you can use
dfm_remove
to remove features.Based on your example:
Better to remove all the stopwords with:
If you still have a tokens object, you can use
tokens_remove
in the same manner, or in the pipeline you have like above.