Sklearn k-means clustering (weighted), determining optimum sample weight for each feature?

839 Views Asked by At

K-means clustering in sklearn, number of clusters is known in advance (it is 2). There are multiple features. Feature values are initially without any weight assigned, i.e. they are treated equally weighted. However, task is to assign custom weights to each feature, in order to get best possible clusters separation. How to determine optimum sample weights (sample_weight) for each feature, in order to get best possible separation of the two clusters? If this is not possible for k-means, or for sklearn, I am interested in any alternative clustering solution, the point is that I need method of automatic determination of appropriate weights for multivariate features, in order to maximize clusters separation.

2

There are 2 best solutions below

0
On

What I understand from sklearn docs, sample_weight is used to give weights for each observations (samples), not features.

If you want to give weight to your features, you can refer to this post: How can I change feature's weight for K-Means clustering?

0
On

In meantime, I have implemented following: clustering by each component separately, then calculating silhouette score, calinski harabasz score, dunn score and inverse davies bouldin score for each component (feature) separately. Then scaling those scores to same magnitude, then PCA them to 1 feature. This produced weights for each component. It seems this approach produces reasonable results. I suppose better approach would be full factorial experiment (DOE), but it seems that this simple approach produces satisfactory results as well.