anti_join is not recognizing tidytext stop words in my dataset

435 Views Asked by Averysaurus At 04 June 2025 at 21:15

I am working on removing stop words from a body of text with the tidytext approach in R. https://www.tidytextmining.com/tidytext.html

The following example works:

library(tidytext)
library(dplyr)

data(stop_words)
str_v <- paste(c("i've been dancing after midnight, i'd know because it's 
daylight"))

str_v %>% 
as_tibble %>% 
unnest_tokens(word, value) %>%
anti_join(stop_words)

When I apply this method to the data I'm working with it does not error, but the stop words are not removed. Does something invisible need to happen to the structure of the text for the stop words to match? The output rows appear identical to the stop words (lowered, squished, etc), and yet they remain... I'm working with protected data and am unable to share out source material. Any suggestions or advice on this problem would be super helpful, thank you!

Original Q&A

There are 1 best solutions below

Averysaurus On 14 February 2021 at 04:04

After struggling with syntax it turns out the problem is an artifact in punctuation, summarized as:

"’" != "'"

Used mutate() to str_replace_all() in the vector and now stop words work.

answer <- 
 my_data %>% 
  mutate(text = str_replace_all(text, "’", "'"))

anti_join is not recognizing tidytext stop words in my dataset

There are 1 best solutions below

Related Questions in R

Related Questions in NLP

Related Questions in TIDYVERSE

Related Questions in TIDYTEXT

Related Questions in ANTI-JOIN

Trending Questions

Popular # Hahtags

Popular Questions