How to properly prepare text data for processing it by already trained Naive Bayes multinomial model?

30 Views Asked by At

I trained a Naive Bayes multinomial model for binary classification of text for the presence of personal data in it.
model = MultinomialNB() model.fit(X_train, y_train)

I can't figure out how to use the already trained model to be able to check any text. An error occurs with this action y_pred = model.predict(new_X_test) - "ValueError: X has 1 features, but MultinomialNB is expecting 6328 features as input.". As I can understand this is related to the CountVectorizer text preprocessor (from sklearn.feature_extraction.text import CountVectorizer)

For example:
X = vectorizer.fit_transform(df['sentence']) X.shape
returns:
(1600, 6328)
the object X has 6328 features
X is the same as the X_train on which the model was trained.

So, in order to test the new text on this trained model, I need to process it (using CountVectorizer) in such a way that the object I will feed to the model has 6328 features.

But I don't understand at all how to do it.
Or maybe I'm wrong about something?

0

There are 0 best solutions below