K-Means - Why the optimal number of cluster is varying with Silhouette Analysis?

451 Views Asked by At

I am using the Silhouette Analysis in K-means Clustering, using the code found it here:

https://medium.com/@cmukesh8688/silhouette-analysis-in-k-means-clustering-cefa9a7ad111

However, when I run the code (using my own data frame) I get different results. In some cases I get

that the optimum number of clusters is 2 while others is 5. Can anyone explain why this happening?

1

There are 1 best solutions below

0
Nassim Hafici On

KMeans algorithm starts to set randomly clusters centers before performing Gradient Descent.

Due to the stochastic nature of the algorithm, your data may be not well suited to use this.

Try to perform your analysis with setting random state to 0, at each iteration like:

km = KMeans(n_clusters=k, random_state=0)

Is this leading to the same optimum ?