Emojis and sentiment analysis in R

32 Views Asked by At

I am doing text analysis and sentiment analysis in R. I have a dataset with tweets, and I am wondering how to deal with emojis. I am not a programmer, I don't know if I should remove them or replace them with something.

So far I've done this (please feel free to correct my code or suggest any way to optimize it):

tweet_corpus \<- corpus(tweets$text)

doc.tokens = tokens(tweet_corpus)

doc.tokens = tokens(doc.tokens, remove_punct = TRUE, remove_numbers = TRUE)

doc.tokens = tokens_select(doc.tokens, selection = 'remove',
stopwords(language = "en", source = "snowball",
simplify = TRUE))

doc.tokens = tokens_tolower(doc.tokens)

install.packages("emoji")
library(emoji)

replace_emojis \<- function(tokens) {
pattern_vector \<- emojis$emoji
replacement_vector \<- paste0(emojis$name, " ")

tokens \<- tokens_replace(tokens, pattern_vector, replacement_vector, case_insensitive = TRUE)

return(tokens)
}

doc.tokens \<- replace_emojis(doc.tokens)

Also, what should I do with links and images?

0

There are 0 best solutions below