I wanted to know if we get roughly the same centroid points for the exact same data set given that the initial centroid points are chosen randomly.
I'm writing a test kmeans program, and they don't seem to match. I wanted to know if what I'm doing is right.
The k-means algorithm requires some initialization of the centroid positions. For most algorithms, these centroids are randomly initialized with some method such as the Forgy method or random partitioning, which means that repeated iterations of the algorithm can converge to vastly different results.
Remember that k-means is iterative, and at each "move centroid" step, each centroid is moved to a position that minimizes its distance from its constituent points. This makes it heavily dependent on the starting position.
Because of this, it's usually advisable to run k-means several times, and select the clustering that minimizes the error.