I ran a HAC (Hierarchical Agglomerative Clustering) using the Iris dataset in R. However, I don't know how to calculate the centroids of i1
, i2
, i3
for each cluster.
My goal is to make a "Buckshot" clustering algorithm.
This is my code:
irists <- read.csv("irists.csv", header = TRUE)
str(irists)
irists.m <- as.matrix(irists[,1:4])
dm <- dist(irists.m, method = "euclidean")
hc <- hclust(dm, method = "complete")
plot(hc)
clusterCut <- cutree(hc,3)
clusterCut
i1 <- iristr.m[c(1,4,12),] #HAC cluster's result
i1
i2 <- iristr.m[c(2,5,8),] #HAC cluster's result
i2
i3 <- iristr.m[c(3,6,7,9,10,11),] #HAC cluster's result
i3
How do I calculate the centroids of i1
,i2
,i3
?
Can I apply the same method for calculating the centroids to text data (e.g. 20newsgroup
of "reuters-21578")?