I am using k-means for clustering articles and it is working perfectly. Now I want to define initial centers to get more reasonable results.
My Python code:
tfidf_matrix = tfidf_vectorizer.fit_transform(articles)
X = np.array([[-19.67480000, -8.546],
[22.010807000,-10.9737],
[11.959700000,19.2701],
[12.254700000, 11.2381],
[16.649700000,-15.2251],
[19.859700000, 13.2601]] , np.float64)
km = KMeans(n_clusters=6,init=X, n_init=1).fit(tfidf_matrix)
when I am trying to define initial centroids, I get the following error:
ValueError: The number of features of the initial centers 2 does not match the number of features of the data 4602.
From the error I get the idea that the dimensions are not equal. How can I transform my initial centers to satisfy the dimensions of the sparse matrix?
The number of features in the centroids should be the same as the number of features in the data.
Your input data (tfidf_matrix) is (1111, 8262) i.e. 1111 samples with 8262 features. Then, your 6 centroids should also have 8262 features. The shape of X should be (6,8262).