After doing k-means classification on a dataset (value of k = 3), I tried to find out the total entropy of all the clusters. (Total number of datapoints, or, the total length of the dataset was : 500)
My classification results:
Cluster 1:
Class: neutral, Count: 64, Pr(neutral): 0.30769
Class: positive, Count: 85, Pr(positive): 0.40865
Class: negative, Count: 59, Pr(negative): 0.28365
Entropy of Cluster: 1.566429
Cluster size: 208
Cluster 2:
Class: neutral, Count: 65, Pr(neutral): 0.363128
Class: positive, Count: 36, Pr(positive): 0.2011173
Class: negative, Count: 78, Pr(negative): 0.4357541
Entropy of Cluster: 1.5182706
Cluster size: 179
Cluster 3:
Class: neutral, Count: 39, Pr(neutral): 0.345132
Class: positive, Count: 30, Pr(positive): 0.265486
Class: negative, Count: 44, Pr(negative): 0.389380
Entropy of Cluster: 1.56750289
Cluster size: 113
Total Entropy: 1.549431124 (which is > 1)
It means, that the 1st cluster contains 3 different types (classes) of datapoints in it, (whereas, for a perfect cluster, it should have contained only 1 type of class) namely, in the 1st cluster, there were a total 208 data points, out of which, 64 of them belongs to the neutral class, 85 belongs to the positive and 59 belongs to the negative class, and so on for the other 2 clusters
I used the formula:
Entropy of a single Cluster
where: c is a classification in the set C of all classifications P(w_c) is probability of a data point being classified as c in cluster w.
where: |w_c| is the count of points classified as c in cluster w n_w is the count of points in cluster w
Total Entropy of a clustering
where:
is the set of clusters. H(w) is a single clusters entropy N_w is the number of points in cluster w N is the total number of points.
I used the above formula to calculate the total entropy of a clustering, the result I got was a value > 1. I thought entropies are supposed to lie between 0 and 1, still I got something > 1, I could not understand my fault here, was my calculation wrong ? (but I have used the formula as was supposed to be used), or I missed something in the formula, or such (you might as well check the results after a manual calculation yourselves)




You're using Shannon Entropy, which measures uncertainty across a categorical distribution.
Because you have three classes, the maximum entropy possible is 1.585 (
log2(3)).