The Multinomial Naive Bayes Classifier is giving the correct result but the other two- The Gaussian NB and the Binomial NB are not. The error it gives is this:

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

But even on adding that function (train_set.toarray()) the error is

AttributeError: 'list' object has no attribute 'toarray'

The code is

import pickle
from nltk.corpus import names
import random
import nltk
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.svm import SVC, LinearSVC, NuSVC
from nltk.classify.scikitlearn import SklearnClassifier
import numpy as np
import scipy as sc

def gender_features(word):
    return {'last_letter': word[-1]}

labeled_names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for name in names.words('female.txt')])
random.shuffle(labeled_names)

featuresets = [(gender_features(n), gender) for (n, gender) in labeled_names]
train_set, test_set = featuresets[500:], featuresets[:500]
classifier = nltk.NaiveBayesClassifier.train(train_set)

print(nltk.classify.accuracy(classifier, test_set)*100)
classifier.show_most_informative_features(5)

MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(train_set)
print ("MNB classifier accuracy: ", (nltk.classify.accuracy(MNB_classifier, test_set))*100)


G_classifier = SklearnClassifier(GaussianNB())
G_classifier.train(train_set)
print ("Gaussian classifier accuracy: ", (nltk.classify.accuracy(G_classifier, test_set))*100)

B_classifier = SklearnClassifier(BernoulliNB())
B_classifier.train(train_set)
print ("Bernoulli classifier accuracy: ", (nltk.classify.accuracy(B_classifier, test_set))*100)
2

There are 2 best solutions below

0
Cherler Ton On

maybe you can do : numpy.array(train_set) ,make list to dense m

0
Atul Kumar On

I got the same problem, while training try to use:

train_set.todense()

It worked for me: