def word_feats(words):
return dict([(word, True) for word in words])
for tweet in negTweets:
words = re.findall(r"[\w']+|[.,!?;]", tweet) #splits the tweet into words
negwords = [(word_feats(words), 'neg')] #tag the words with feature
negfeats.append(negwords) #add the words to the feature list
for tweet in posTweets:
words = re.findall(r"[\w']+|[.,!?;]", tweet)
poswords = [(word_feats(words), 'pos')]
posfeats.append(poswords)
negcutoff = len(negfeats)*3/4 #take 3/4ths of the words
poscutoff = len(posfeats)*3/4
trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff] #assemble the train set
testfeats = negfeats[negcutoff:] + posfeats[poscutoff:]
classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)
classifier.show_most_informative_features()
I am getting the following error when running this code...
File "C:\Python27\lib\nltk\classify\naivebayes.py", line 191, in train
for featureset, label in labeled_featuresets:
ValueError: need more than 1 value to unpack
The error is coming from the classifier = NaiveBayesClassifier.train(trainfeats) line and I'm not sure why. I have done something like this before, and my trainfeats seams to be in the same format as then... a sample from the format is listed below...
[[({'me': True, 'af': True, 'this': True, 'joy': True, 'high': True, 'hookah': True, 'got': True}, 'pos')]]
what other value does my trainfeats need to create the classifier?emphasized text
The comment by @Prune is right: Your
labeled_featuresets
should be a sequence of pairs (two-element lists or tuples): A feature dict and a category for each data point. Instead, each element in yourtrainfeats
is a list containing one element: A tuple of those two things. Lose the square brackets in both feature-building loops and this part should work correctly. E.g.,Two more things: Consider using
nltk.word_tokenize()
instead of doing your own tokenization. And you should randomize the order of your training data, e.g. withrandom.scramble(trainfeats)
.