NLTK Vader SentimentIntensityAnalyzer Bigram

798 Views Asked by At

For the VADER SentimentIntensityAnalyzer within Python, is there a way to add a bigram rule? I tried updating the lexicon with a two word input, but it did not change the polarity score. Thanks in advance!

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyser = SentimentIntensityAnalyzer()

#returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))

analyser.lexicon['no issues'] = 0.0
#still returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))
1

There are 1 best solutions below

1
On BEST ANSWER

There is no straightforward way to add bigram to the vader lexicon. This is because vader considers individual tokens for sentiment analysis. However, one can do this using following steps:

  1. Create bigrams as tokens. For example, you can convert the bigram ("no issues") into a token ("noissues").
  2. Maintain a dictionary of polarity of the newly created tokens. {"noissues" : 2}
  3. Then perform additional text processing before passing the text for sentiment score calculation.

Following code accomplishes the above:

allowed_bigrams = {'noissues' : 2} #add more as per your requirement
    
def process_text(text):
    tokens = text.lower().split() # list of tokens
    bigrams = list(nltk.bigrams(tokens)) # create bigrams as tuples of tokens
    bigrams = list(map(''.join, bigrams)) # join each word without space to create new bigram
    bigrams.append('...') # make length of tokens and bigrams list equal
     
    #begin recreating the text
    final = ''
    for i, token in enumerate(tokens):
        b = bigrams[i]
        
        if b in allowed_bigrams:
          join_word = b # replace the word in text by bigram
          tokens[i+1] = '' #skip the next word
        else:
            join_word = token
        final += join_word + ' '
    return final
text  = 'Hello, I have no issues with you'
print (text)
print (analyser.polarity_scores(text))
final = process_text(text)
print (final)
print(analyser.polarity_scores(final))

The output :

Hello, I have no issues with you
{'neg': 0.268, 'neu': 0.732, 'pos': 0.0, 'compound': -0.296}
hello, i have noissues  with you 
{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.4588}

Notice in the output, how two words "no" and "issues" have been added together to form bigram "noissues".