I'm trying to estimate topics through NMF over a TF-IDF decomposition.However when I run the following line:
nmf = NMF(n_components = dimension)
nmf_array = nmf.fit_transform(x_tfidf)
I got this error:
ValueError: array must not contain infs or NaNs
But, when I search for Nans and infs inside the Tf-IDF can't find any:
np.isinf(x_tfidf.data).any() #this return False
np.isnan(x_tfidf.data).any() #this also return False
The complete code is:
nltk.download('stopwords')
stop_words_sp = stopwords.words('spanish')
custom_stop_words = ["https", "citar", "www", "com", "youtube", "mil","ar", "hs"]
stop_words = custom_stop_words + stop_words_sp
count_vect = CountVectorizer(max_df = 0.9, min_df = 0.1, stop_words=stop_words, lowercase=True,analyzer=stemmed_words)
x_counts = count_vect.fit_transform(textos)
# Genero matriz con valorizacion tf-idf
tfidf_transformer = TfidfTransformer()
x_tfidf = tfidf_transformer.fit_transform(x_counts)
lda = NMF(n_components = 2)
lda_array = lda.fit_transform(x_tfidf)
Where the variable textos is an array of Spanish text without any empty string.
This is the complete trace error:
