k-means clustering for Testing data classification

1.2k Views Asked by At

I want to do k-means clustering to classify Testing data based on Training data both of which have 3 classes (1,2 and 3).

How would I classify the Testing data set using a cluster size of e.g. k=10 in kmeans (e.g. using Matlab)? I know that I can have k=3 and then use nearest neighbour to identify the data based on its nearest cluster size... but not sure what I would use for values other that k=3? How would you label each of those 10 clusters?

Thanks

2

There are 2 best solutions below

0
On BEST ANSWER

It is a little bit unclear what exactly you want to do, although here is an outline from what I understand.

When you are clustering data, the labels are ideally not present, as either you use the clustering to get insights from the data or use it for pre-processing.

Although, if you want to perform a clustering and then assign class id to a new datapoint based on the nearness of the cluster centers, then you can do the following.

First, you select the k by bootstrapping or other methods, maybe use Silhouette coefficients. Once you get the cluster centers, check which center is closest to the new datapoint and assign the class id accordingly.

In such cases you might be interested to use the Rand Index or the Adjusted Rand Index, to get the cluster quality.

0
On

The classification of 10 clusters would be no different than the classification of 3 clusters. The number of clusters given by k-means is independent of the number of "classes" in the data. k-means is an unsupervised learning algorithm, meaning that it gives no consideration to the class of the training data during training.

The algorithm would look something like this:

distances = dist(test_point, cluster_centers)
cluster = clusters[ min(distances) ]
class = mode(cluster.class)

where we find the cluster with minimum distance between the cluster center and our test point, then we find the most common class label among the elements contained in that minimally-distant cluster.