How is sentiment score calculated in the R SentimentAnalysis package?

1.6k Views Asked by At

I'm using the General Inquirer dictionary with the SentimentAnalysis package and I can't figure out how they assign the sentiment score...

For example, if I run the following code:

sentiment <- analyzeSentiment(sampledf)

summary(sentiment$SentimentGI)

I'll get an output like this:

Min.      1st Qu.   Median     Mean      3rd Qu.     Max. 

-0.80000  -0.16667  -0.07692   -0.07313  0.00000     0.66667

What's the scale being used here? -1 to 1? I don't know how to interpret these results.

Thanks!

1

There are 1 best solutions below

0
On

All sentiment-related scores are calculated based on the formula

(#positive - #negative) / #all

where #positive refers to the number of positive words, #negative to the number of negative words and #all to the total word count. Hence, the sentiment score comes from the interval [-1, +1]. A value of 0 indicates that there are as many positive as negative words in a document.

NB: In practice, the empirical mean/median value is not necessarily located at exactly zero as either positive/negative is perceived stronger or even appears more frequent. Hence, one would prefer to choose a different cutoff point to discriminate positive from negative.

Other scores are as follows:

  • Negativity or positivity only count the ratio of negative or positive words, respectively. Hence, this value is given by e.g. #negative / #all and is in [0, 1].
  • Polarity uses the formula (#positive - #negative) / (#positive + #negative).
  • Ratio is the share of dictionary expressions, i.e. (#positive + #negative) / #all.