I am working on removing stop words from a body of text with the tidytext approach in R. https://www.tidytextmining.com/tidytext.html
The following example works:
library(tidytext)
library(dplyr)
data(stop_words)
str_v <- paste(c("i've been dancing after midnight, i'd know because it's
daylight"))
str_v %>%
as_tibble %>%
unnest_tokens(word, value) %>%
anti_join(stop_words)
When I apply this method to the data I'm working with it does not error, but the stop words are not removed. Does something invisible need to happen to the structure of the text for the stop words to match? The output rows appear identical to the stop words (lowered, squished, etc), and yet they remain... I'm working with protected data and am unable to share out source material. Any suggestions or advice on this problem would be super helpful, thank you!
After struggling with syntax it turns out the problem is an artifact in punctuation, summarized as:
Used
mutate()
tostr_replace_all()
in the vector and now stop words work.