I am trying to figure out how to create a lists of lists where each sublist contains the number of positive words and negative words in a given text. Below I have the names of the positive and negative text files that I am working with and an example of the words in those text files. Also an example text in the 'X_train' variable. And what the output should look like.
positive_words.txt # happy, great, amazing
negative_words.txt = # sad, bad, poor
X_train = ['the food was great and service was amazing', 'i was happy with my food', 'my food tasted bad', 'i am poor and could not buy the food so i am sad but least i have chicken']
X_train_lexicon_features = ?
How the output of the above variable should look.
print(X_train_lexicon_features)
OUTPUT: [[2,0], [1,0], [0,1], [0,2]]
# From the example given above, the first text in the X_train variable should yield [2,0] since it has 'great' and 'amazing' which are both in the positive_lexicon. [positive,negative]
Below is a class to count the number of positive and negative words.
class LexiconClassifier():
def __init__(self):
self.positive_words = set()
with open('positive-words.txt', encoding = 'utf-8') as iFile:
for row in iFile:
self.positive_words.add(row.strip())
self.negative_words = set()
with open('negative-words.txt', encoding='iso-8859-1') as iFile:
for row in iFile:
self.negative_words.add(row.strip())
def count_pos_words(self, sentence):
num_pos_words = 0
for word in sentence.lower().split():
if word in self.positive_words:
num_pos_words += 1
return num_pos_words
def count_neg_words(self, sentence):
num_neg_words = 0
for word in sentence.lower().split():
if word in self.negative_words:
num_neg_words += 1
return num_neg_words
Here is the code I have run to return the number of positive words per text.
myLC = LexiconClassifier()
X_train_lexicon_features = []
for i in X_train:
X_train_lexicon_features.append(myLC.count_pos_words(i))
OUTPUT: [2,1,0,0]
What I am unsure of is how to mix in the 'count_neg_words' function into the code above that will also return a lists of lists like so: [[2,0], [1,0], [0,1], [0,2]].
I appreciate any advice and thank you in advance!