Remove specific word from a dfm

1.3k Views Asked by At

From this process

    library(stm)
library(tidyr)
library(quanteda)
     testDfm <- gadarian$open.ended.response %>%
             tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
             dfm()

Let's say that we check the frq

dftextstat <- textstat_frequency(testDfm)

and we want to remove some specific words from dfm. Accroding to the dftextstat we want to remove c("and", "to") Is there any way to make it in the dfm without the need to run again the lines to create the dfm?

1

There are 1 best solutions below

0
On BEST ANSWER

If you already have a dfm, you can use dfm_remove to remove features.

Based on your example:

# remove "and" and "to"
testDfm <- dfm_remove(testDfm, c("and", "to"))

Better to remove all the stopwords with:

dfm_remove(testDfm, stopwords("english"))

If you still have a tokens object, you can use tokens_remove in the same manner, or in the pipeline you have like above.