ValueError: array must not contain infs or NaNs with NMF and TF-IDF in Python

117 Views Asked by At

I'm trying to estimate topics through NMF over a TF-IDF decomposition.However when I run the following line:

nmf = NMF(n_components = dimension)
nmf_array = nmf.fit_transform(x_tfidf)

I got this error:

ValueError: array must not contain infs or NaNs

But, when I search for Nans and infs inside the Tf-IDF can't find any:

np.isinf(x_tfidf.data).any() #this return False
np.isnan(x_tfidf.data).any() #this also return False

The complete code is:

nltk.download('stopwords')
stop_words_sp = stopwords.words('spanish')
custom_stop_words = ["https", "citar", "www", "com", "youtube", "mil","ar", "hs"]
stop_words = custom_stop_words + stop_words_sp
count_vect = CountVectorizer(max_df = 0.9, min_df = 0.1, stop_words=stop_words, lowercase=True,analyzer=stemmed_words)
x_counts = count_vect.fit_transform(textos)

    # Genero matriz con valorizacion tf-idf
tfidf_transformer = TfidfTransformer()
x_tfidf = tfidf_transformer.fit_transform(x_counts)
lda = NMF(n_components = 2)
lda_array = lda.fit_transform(x_tfidf)

Where the variable textos is an array of Spanish text without any empty string.

This is the complete trace error:

enter image description here

0

There are 0 best solutions below