I'm new to R and I want to implement the following algorithm:
Step 1. Pick one of the dataset points randomly as the center of the first cluster
Step 2. For the next cluster, find the point with maximum distance to the center of the previous cluster that has not been already chosen as a center
Step 3. Then, choose this point as the center of the next cluster
Step 4. Repeat steps 2 and 3 until you initialize the centers of all clusters
I have attempted to write this algorithm. I got the distances but I could not match it with original points or follow through the iterations to get the next 25 points.
Can some one help me please?
img_list=list.files()
img_list
img_mat_list <- as.matrix(lapply(img_list,readJPEG))
img_mat_list
images = as.matrix(do.call(rbind,img_mat_list))
dim(images)
[1] 2184 12
means = as.matrix(lapply(img_mat_list, mean))
s1 = sample(images, 30)
> dput(s1)
c(0.141176470588235, 1, 1, 0.682352941176471, 1, 0.925490196078431,
0.0274509803921569, 0.00784313725490196, 0.364705882352941,0.96078431372549,
0, 0.16078431372549, 0.972549019607843, 0.0274509803921569, 1,
0.929411764705882, 0.00392156862745098, 0.972549019607843, 1,
1, 0.6, 0, 0.23921568627451, 0, 0.988235294117647, 0.0156862745098039,
0, 0.945098039215686, 0, 0.996078431372549)
> s2 = sample(means, 30)
> dput(s2)
list(0.621813725490196, 0.666421568627451, 0.51797385620915,
0.53287037037037, 0.489297385620915, 0.678513071895425, 0.693845315904139,
0.618600217864924, 0.567892156862745, 0.64332788671024, 0.342565359477124,
0.568082788671024, 0.589351851851852, 0.602205882352941,
0.689025054466231, 0.460484749455338, 0.71266339869281, 0.479575163398693,
0.677941176470588, 0.602205882352941, 0.466530501089325,
0.516884531590414, 0.568082788671024, 0.604738562091503,
0.557080610021786, 0.544580610021786, 0.619226579520697,
0.515032679738562, 0.524754901960784, 0.516884531590414)
centers = list()
K = 26
center = sample(means, 1)
distance = function(point, group) {
return(dist(t(array(c(point, t(group)), dim=c(ncol(group), 1+nrow(group)))))[1:nrow(group)])}
for (i in 1 : length(K))
for (j in 1 : length(means))
distances = distance(center, means)
centers[i] = which.max(distances)
distances
[1] 0.027151416 0.035185185 0.018899782 0.027151416
[5] 0.035185185 0.018899782 0.027151416 0.126633987
[9] 0.126443355 0.126443355 0.075435730 0.126633987
> centers
[1] 60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[17] NA NA NA NA NA NA NA NA NA 60
the distances are an array of 182 distances
and the centers are supposed to be centers of the clusters