Find the number of positive and negative words in a text using a Lexicon,

1.5k Views Asked by At

I am trying to figure out how to create a lists of lists where each sublist contains the number of positive words and negative words in a given text. Below I have the names of the positive and negative text files that I am working with and an example of the words in those text files. Also an example text in the 'X_train' variable. And what the output should look like.


positive_words.txt # happy, great, amazing

negative_words.txt = # sad, bad, poor

X_train = ['the food was great and service was amazing', 'i was happy with my food', 'my food tasted bad', 'i am poor and could not buy the food so i am sad but least i have chicken']

X_train_lexicon_features = ?


How the output of the above variable should look.

print(X_train_lexicon_features)

OUTPUT: [[2,0], [1,0], [0,1], [0,2]]

# From the example given above, the first text in the X_train variable should yield [2,0] since it has 'great' and 'amazing' which are both in the positive_lexicon. [positive,negative]


Below is a class to count the number of positive and negative words.

class LexiconClassifier():
    def __init__(self):
        self.positive_words = set()
        with open('positive-words.txt', encoding = 'utf-8') as iFile:
            for row in iFile:
                self.positive_words.add(row.strip())

        self.negative_words = set()
        with open('negative-words.txt', encoding='iso-8859-1') as iFile:
            for row in iFile:
                self.negative_words.add(row.strip())
    
    def count_pos_words(self, sentence):
        num_pos_words = 0
        for word in sentence.lower().split():
            if word in self.positive_words:
                num_pos_words += 1
        return num_pos_words

    def count_neg_words(self, sentence):
        num_neg_words = 0
        for word in sentence.lower().split():
            if word in self.negative_words:
                num_neg_words += 1
        return num_neg_words

Here is the code I have run to return the number of positive words per text.

myLC = LexiconClassifier()

X_train_lexicon_features = []

for i in X_train:
     X_train_lexicon_features.append(myLC.count_pos_words(i))

OUTPUT: [2,1,0,0]

What I am unsure of is how to mix in the 'count_neg_words' function into the code above that will also return a lists of lists like so: [[2,0], [1,0], [0,1], [0,2]].

I appreciate any advice and thank you in advance!

0

There are 0 best solutions below