I have a vector x which represents text data transformed with tf-idf.
Then I calculate the distance between all points of the vector using cosine_similarity() function of sklearn and create the linkage_matrix of the ward distance using scipy.cluster.hierarchy. This creates an hierarchical clustering, but I can not figure out how to calculate the distance of each observation from each centroid.
When using kmeans from sklearn I figured out that I can caluclate this by calling the transform() method for the x vector, which then returns a matrix with the euclidean distance between each observation and each cluster. I would like to do something similar using scipy.cluster.hierarchy algorithm.
I have tried examining the linkage_matrix returned, as well as the scipy.spatial.distance.pdist, but it does not seem to be what I need.
Is there any way to achieve this?
threshold : maximum distance between two points in a cluster
Z is 1-D array which assigns cluster number to each point. Now you can estimate distance:
Furter REading: https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.fcluster.html#scipy.cluster.hierarchy.fcluster