Can the total Entropy of all clusters be greater than 1, after classification?

80 Views Asked by SSaha13 At 16 November 2023 at 16:50

After doing k-means classification on a dataset (value of k = 3), I tried to find out the total entropy of all the clusters. (Total number of datapoints, or, the total length of the dataset was : 500)

My classification results:

Cluster 1:
Class: neutral, Count: 64, Pr(neutral): 0.30769
Class: positive, Count: 85, Pr(positive): 0.40865
Class: negative, Count: 59, Pr(negative): 0.28365

Entropy of Cluster: 1.566429

Cluster size: 208

Cluster 2:
Class: neutral, Count: 65, Pr(neutral): 0.363128
Class: positive, Count: 36, Pr(positive): 0.2011173
Class: negative, Count: 78, Pr(negative): 0.4357541

Entropy of Cluster: 1.5182706

Cluster size: 179

Cluster 3:
Class: neutral, Count: 39, Pr(neutral): 0.345132
Class: positive, Count: 30, Pr(positive): 0.265486
Class: negative, Count: 44, Pr(negative): 0.389380

Entropy of Cluster: 1.56750289

Cluster size: 113

Total Entropy: 1.549431124 (which is > 1)

It means, that the 1st cluster contains 3 different types (classes) of datapoints in it, (whereas, for a perfect cluster, it should have contained only 1 type of class) namely, in the 1st cluster, there were a total 208 data points, out of which, 64 of them belongs to the neutral class, 85 belongs to the positive and 59 belongs to the negative class, and so on for the other 2 clusters

I used the formula:

Entropy of a single Cluster

where: c is a classification in the set C of all classifications P(w_c) is probability of a data point being classified as c in cluster w.

where: |w_c| is the count of points classified as c in cluster w n_w is the count of points in cluster w

Total Entropy of a clustering

where:

is the set of clusters. H(w) is a single clusters entropy N_w is the number of points in cluster w N is the total number of points.

I used the above formula to calculate the total entropy of a clustering, the result I got was a value > 1. I thought entropies are supposed to lie between 0 and 1, still I got something > 1, I could not understand my fault here, was my calculation wrong ? (but I have used the formula as was supposed to be used), or I missed something in the formula, or such (you might as well check the results after a manual calculation yourselves)

Original Q&A

There are 1 best solutions below

fucalost On 18 November 2023 at 15:06

You're using Shannon Entropy, which measures uncertainty across a categorical distribution.

Because you have three classes, the maximum entropy possible is 1.585 (log2(3)).

Can the total Entropy of all clusters be greater than 1, after classification?

There are 1 best solutions below

Related Questions in MATH

Related Questions in NLP

Related Questions in CLUSTER-ANALYSIS

Related Questions in HIERARCHICAL-CLUSTERING

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions