UMAP "ValueError: cannot assign slice from input of different size"

1.3k Views Asked by At

I am using umap-learn 0.5.3 for dimension reduction of a Numpy array. The array, say arrival_tfidf, is shaped (7898, 2969) and is a TF-IDF transformation of 7898 messages, containing float64 elements. When running the following snippet

import umap.umap_ as umap
umap_embeddings = (umap.UMAP(n_neighbors=15,
                             n_components=4,
                             metric='cosine',
                             random_state=42)
        .fit_transform(arrival_tfidf))

I get the following error

ValueError: cannot assign slice from input of different size

However, when using a random Numpy array of identical shape, random_df = np.random.rand(7898, 2969) instead of arrival_tfidf, everything works fine.

I noticed arrival_tfidf is rather sparse, namely around 100.000 elements.

2

There are 2 best solutions below

1
On

I downgraded pynndescent to 0.5.8 and that helped

0
On

I had the same problem. I downgraded from pynndescent version 0.5.10 to version 0.5.8 and now it works for me. Sparse data might have something to do with it, as my data is also sparse, and on other types of data the same code could work well. Most of my failed configurations for UMAP was when robust scaling was used.