I am running this relatively straightforward algorithm.
if I understand the algorithm correctly if you cluster to, say, 8 clusters, you should had the results for all clusters above 8, right?
Would you actually have to run the code multiple times, or how would you retrieve the intermediate clustering?
%%time
for k in K:
start_time = time.time() # Start timing
s[k] = []
db[k] = []
np.random.seed(123456) # for reproducibility
model = AgglomerativeClustering(linkage='ward', connectivity=w.sparse, n_clusters=k)
y = model.fit(cont_std)
cont_std_['AHC_k'+ str(k)] = y.labels_
silhouette_score = metrics.silhouette_score(cont_std, y.labels_, metric='euclidean')
print('silhouette at k=' + str(k) + ': ' + str(silhouette_score))
s[k].append(silhouette_score)
davies_bouldin_score = metrics.davies_bouldin_score(cont_std, y.labels_)
print(f'davies bouldin at k={k}: {davies_bouldin_score}')
db[k].append(davies_bouldin_score)
end_time = time.time() # End timing
print(f"Time for k={k}: {end_time - start_time} seconds") # Print the duration for the cycle
This is probably a rather roundabout way to get there, but it appears to work. I may yet try to clean this up later.
clustersis then a list of lists of indices. To turn that into a series of cluster labels (for scoring e.g.):Colab notebook using the Iris dataset