I am not able to understand how SciPy Hierarchical Clustering computes distance between original points or clusters in dendogram.
import scipy.cluster.hierarchy as hclus
import numpy
import cPickle
distmatrix = cPickle.load(open("mydistmatrix.pkl", "rb"))
print distmatrix
dendogram = hclus.linkage(distmatrix, method="complete")
numpy.savetxt("mydendogram.txt", dendogram, fmt='%.1f')
DistMatrix is as below, correctly printed. I also print mydendogram.txt, which I cannot make sense of.
Distance Matrix - I've appended i- as line number, this is not part of matrix.
0- [[ 0 11 68 60 60 60 61 7 17 73]
1- [11 0 68 52 52 51 55 17 6 73]
2- [68 68 0 90 90 91 94 73 73 6]
3- [60 52 90 0 10 11 36 62 55 92]
4- [60 52 90 10 0 2 36 63 55 92]
5- [60 51 91 11 2 0 36 63 54 93]
6- [61 55 94 36 36 36 0 63 57 96]
7- [ 7 17 73 62 63 63 63 0 11 68]
8- [17 6 73 55 55 54 57 11 0 68]
9- [73 73 6 92 92 93 96 68 68 0]]
Dendogram - I've appended step number i and new node n+i at end for readability, not part of dendogram.
0 - 4.0 5.0 3.6 2.0 - 10
1 - 2.0 9.0 13.7 2.0 - 11
2 - 1.0 8.0 15.0 2.0 - 12
3 - 0.0 7.0 15.7 2.0 - 13
4 - 3.0 10.0 17.6 3.0 - 14
5 - 12.0 13.0 33.1 4.0 - 15
6 - 6.0 14.0 66.9 4.0 - 16
7 - 15.0 16.0 148.1 8.0 - 17
8 - 11.0 17.0 208.9 10.0 - 18
Now, I can understand node 4 and 5 will merge first, as distance between them is smallest in distance matrix, but distance in matrix is 2 but 3.6 in dendogram. Similarly distance between nodes 2 and 9 is 6, but dendogram shows 13.7. Maximum distance is 208.9 of dendogram even if maximum number in distance matrix is 96. It appears that order of merging is correct, but I don't understand how distance is computed, and that is important to me identify suitable point to cut the tree to obtain the clusters. Documentation (http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html#scipy.cluster.hierarchy.linkage) doesn't help.
Please explain.
This helps - Use Distance Matrix in scipy.cluster.hierarchy.linkage()?
(Not sure if question should be deleted, or kept as handy reference)