I am working with spotify tracks database and trying to understand how columns danceability, liveness and energy affect popularity (use discrete popularity: -1, 0, 1). I want to do dimensionality reduction from three columns to two. Here's the snippet:
reducer = umap.UMAP(n_neighbors=10, min_dist=0.1)
X_reduced = reducer.fit_transform(X)
plt.figure(figsize=(10, 5))
plt.title('Projecting %d-dimensional data to 2D' % X.shape[1])
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, edgecolor='none', s=20,
cmap=ListedColormap(['yellow', 'red', 'green']))
plt.colorbar(ticks=range(3), label='popularity value')
plt.show()
But this code shows me data where popular and unpopular tracks are identical (picture 1) while I need to get data with 3 different clusters (picture 2). picture 1 picture 2
I think that the problem can be in umap parameters or maybe it is bad clusterization task. I tried to change it somehow but it didn't work.