I want to run buckshot algorithm in R what combine hac(hierarchical clustering) with k-means clustering. so, I want to be select many center of k-means. For example, one of a cluster has three seed. This is my code,
iris data k-means
iristr <- read.csv("iristr.CSV", header = TRUE)
str(iristr)
iristr.m <- as.matrix(iristr[,1:4])
km <- kmeans(iristr.m, centers = 3)
km
table(km$cluster,iristr$Species)
iris data buckshot
irists <- read.csv("irists.csv", header = TRUE)
str(irists)
irists.m <- as.matrix(irists[,1:4])
dm <- dist(irists.m, method = "euclidean")
hc <- hclust(dm, method = "complete")
plot(hc)
clusterCut <- cutree(hc,3)
clusterCut
i1 <- iristr.m[c(1,4,12),] # one of cluster have many seed(center)
i1
i2 <- iristr.m[c(2,5,8),] # one of cluster have many seed(center)
i2
i3 <- iristr.m[c(3,6,7,9,10,11),] # one of cluster have many seed(center)
i3
buckshot <- kmeans(iristr.m, centers=i1,i2,i3) # realized only "i1" centers
buckshot
table(buckshot$cluster,iristr$Species)
Here is an example to apply the Kmeans clustering algorithm on the Iris data.
Using the Iris data, the features column 1-4 is assigned to variable x, and the class to variable y.
In Kmeans algorithm, the initial cluster assignments are random. Since we know that there are 3 species in that data, the total number of clusters can be specified as 3. Also since the starting assignments in Kmeans are random, the nstart can be assigned 10, meaning 10 different (random) initial center assignments will be tried and the one having lowest within-cluster sum of squares (WCSS) (sum of distance functions of each point in the cluster to the K center) will be selected as final. You can assign a higher value to the parameter "nstart" to tell the Kmeans algorithm to try more possible random initial center assignments.
To know the error, the clustering result is then compared with the species/classes in the iris data.
Finally the result is visualized by plotting the Sepal length as x-axis and Sepal Width as y-axis (you can choose different though).