Negators and modifiers with Syuzhet vs. SentimentR for sentiment analysis in R

295 Views Asked by At

This question is twofold. An answer to either question would be an adequate solution. Very thankful if you could show suggestion as R-code.

1) The NRC lexicon in the Syuzhet packet yields the broadest range of emotions, but it doesn't seem to control for negators. After reading the documentation I'm still not sure how to overcome this. Perhaps by multiplying the positively and negatively coded words for each sentence, e.g. I(0) AM(0) NOT(-1) ANGRY(-1) = (-1*-1) = 1. However, I don't know how to write this in proper code.

2) After much research and testing, I found the jockers_rinker lexicon in SentimentR handles negators and modifies better(https://github.com/trinker/sentimentr#comparing-sentimentr-syuzhet-meanr-and-stanford). I could use SentimentR to "quality test" the results from the Suyzhet/NRC results by comparing the binary sentiment outputs from the two packages. If they deviate too much, the NRC isn't accurate enough for that particular body of text. However, I only know how to get individual scores and not the total scores for each sentiment (sum of positive and sum of negative)

You can see how my test results compare here on a concatenated string with emotions expressed with and without modifiers and negators.

   #Suyzhet:
   library("syuzhet")
   MySentiments = c("I am happy", "I am very happy", "I am not happy","It was 
                   bad","It is never bad", "I love it", "I hate it")
   get_nrc_sentiment(MySentiment, cl = NULL, language = "english")
   #Result:
   anger anticipation disgust fear joy sadness surprise trust negative positive
    0            1       0    0   1       0        0     1        0        1
    0            1       0    0   1       0        0     1        0        1
    0            1       0    0   1       0        0     1        0        1
    1            0       1    1   0       1        0     0        1        0
    1            0       1    1   0       1        0     0        1        0
    0            0       0    0   1       0        0     0        0        1
    1            0       1    1   0       1        0     0        1        0

   #SentimentR:
    library("sentimentr")
    MySentiments = c("I am happy", "I am very happy", "I am not happy","It was 
                   bad","It is never bad", "I love it", "I hate it")
                   sentiment(MySentiments, polarity_dt = 
                   lexicon::hash_sentiment_jockers_rinker,
                   valence_shifters_dt = lexicon::hash_valence_shifters, hyphen 
                   = "", amplifier.weight = 0.8, n.before = 5, n.after = 2, 
                   question.weight = 1, adversative.weight = 0.25, 
                   neutral.nonverb.like = FALSE, missing_value = NULL)
      #Results:
       element_id sentence_id word_count  sentiment
           1           1          3  0.4330127
           2           1          4  0.6750000
           3           1          4 -0.3750000
           4           1          3 -0.4330127
           5           1          4  0.3750000
           6           1          3  0.4330127
           7           1          3 -0.4330127

The first output seems not to recognize the importance of "very", "not" and "never".

0

There are 0 best solutions below