I am currently working on my Ph.D., using k-means-cluster to identify my sample for further qualitative research. Since I wanted to identify the number of cluster inductive, I went for the Calinski-Harabasz-Index (VRC), where you can (in short) just sum up the F values of your ANOVA for every cluster solution. If the VRC is high, thats your cluster to go. But since this lacks a little bit of evidence, it is proposed to use the w value (short: You want to identify the w value for c=5: (VRC of Cluster 6 - VRC of Cluster 5) - (VRC of Cluster 5 - VRC of Cluster 4)). You want to go for the lowest w value possible, as this shows you a homogenious cluster setting towards the neighbouring cluster.
So far so good. I did that manually (i do not trust spss and i am to weak for R or other fancy stuff) and ran into trouble: I got negative w Values. So I reached out to some guys, I went around and asked even some brilliant heads (whom wrote a whole book about this) and they could not help me, just saying that the VRC can not be negative (obviously) but noone can tell me what to do if the w value is negative. From my point of view, it is definitely possible, as all that counts is if the number is ->0.
So can anyone tell me what to do with a negative w value? Is it valid? Should I cancel it out? Should I invert it (currently my favoured method)? And if you have a clue: Do you know any literature regarding that?
I am really lost and I would appreciate your help. Thanks so much in advance!
Invertion helps for graphical understanding. So I went for that so far.