Python CountVectorizer(): why do we have to assign CountVectorizer() to a variable in order for this to work?

103 Views Asked by Felipe Queiroz At 23 June 2025 at 20:22

I took this example from the SKLearn website. Here's the initial code:

from sklearn.feature_extraction.text import CountVectorizer
corpus = ['This is the first document.',
          'This document is the second document.',
          'And this is the third one.',
          'Is this the first document?']

# WORKING: assigning a variable "vectorizer" for CountVectorizer()
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names()
>>> ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

# NOT WORKING
X = CountVectorizer().fit_transform(corpus)
CountVectorizer().get_feature_names()
>>> NotFittedError: Vocabulary not fitted or provided

I'm confused at this point. Why do we have to assign a variable to CountVectorizer() if they are exactly the same thing?

Original Q&A

There are 1 best solutions below

BrokenBenchmark On 27 April 2022 at 14:53

In the first example, you create one CountVectorizer() object and use it throughout the entire code snippet.

In the second example, the two CountVectorizers() refer to two different objects.

Let's walk through the code.

X = CountVectorizer().fit_transform(corpus)
CountVectorizer().get_feature_names()

In the first line, we create a new CountVectorizer() object, call .fit_transform() on it, and then assign the result of the call to .fit_transform() to X.

In the second line, we create a different CountVectorizer() object and call .get_feature_names() on it. This object is completely independent from the first one we created; it does not share any memory with the original object. Since you haven't called the .fit_transform() method on this one, Python throws an error stating that the vocabulary hasn't been fitted.

Python CountVectorizer(): why do we have to assign CountVectorizer() to a variable in order for this to work?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SKLEARN-PANDAS

Related Questions in TEXT-EXTRACTION

Related Questions in COUNTVECTORIZER

Trending Questions

Popular # Hahtags

Popular Questions