K-Means - Why the optimal number of cluster is varying with Silhouette Analysis?

451 Views Asked by Robin_hood_963 At 16 December 2021 at 08:53

I am using the Silhouette Analysis in K-means Clustering, using the code found it here:

However, when I run the code (using my own data frame) I get different results. In some cases I get

that the optimum number of clusters is 2 while others is 5. Can anyone explain why this happening?

There are 1 best solutions below

Nassim Hafici On 17 December 2021 at 06:11

KMeans algorithm starts to set randomly clusters centers before performing Gradient Descent.

Due to the stochastic nature of the algorithm, your data may be not well suited to use this.

Try to perform your analysis with setting random state to 0, at each iteration like:

km = KMeans(n_clusters=k, random_state=0)

Is this leading to the same optimum ?