Getting accented characters recognized when building a custom stopwords lexicon in R

37 Views Asked by Jane Cronin At 28 July 2025 at 15:42

I'm building a custom stopwords lexicon in R to remove accented characters. I thought that using the unicode reference would enable this, but it doesn't work and I'm having trouble thinking off different solutions, especially as some of these could not be covered by running a lexicon from another language.

Current code:

en_custom_stopwords <- bind_rows(data_frame(word = c("8217", "8216", "le", "de", "en", "el", "8221", "8220", "los", "039", "se", 
                                                     "aei", "\\\\U+00E4"), lexicon = c("custom")), stop_words)

This words find with regular characters.

Original Q&A

Getting accented characters recognized when building a custom stopwords lexicon in R

There are 0 best solutions below

Related Questions in R

Related Questions in TEXT-MINING

Related Questions in STOP-WORDS

Related Questions in LEXICON

Trending Questions

Popular # Hahtags

Popular Questions