Silhouette plot from dendrogram in R

Question

Silhouette plot from dendrogram in R

806 Views Asked by YYM17 At 15 March 2022 at 00:44

I'm using hierarchical clustering for my data. And I'd like to plot the Silhouette score across different number of cluster. I have searched a lot posts, and I found that I need to use pam for the clustering in order to plot the Silhouette score. I'm wodering if there is a way to plot based on hierarchical clustering result?

Here is a sample data:

structure(list(safe = c(1.5, 1.5, 2, 1, 1, 1, 2, 1.5, 1, 1, 1.5, 
1, 2, 1, 3.5, 1, 1, 1.5, 1.5, 1, 1, 1, 1, 2, 1.5, 1, 1.5, 1, 
1, 1.5, 2.5, 1, 2, 1, 1.5, 1, 1.5, 1, 1, 1, 2.5, 2, 1, 1, 1, 
2, 1, 1.5, 1.5, 1.5, 1.5, 3, 3, 1, 1.5, 2, 1, 1.5, 1, 1.5, 1, 
1.5, 1, 1, 1, 1.5, 1, 1, 1, 1.5, 1.5, 1.5, 1, 2, 1, 1, 1, 1, 
1, 1, 1.5, 1, 1, 2, 1, 1, 1, 1.5, 1, 1.5, 1.5, 1, 1, 1, 1, 1.5, 
1, 2, 1.5, 1, 1), nhood.soccapi = c(1.8, 1.6, 2.8, 2.2, 2, 3.6, 
3.2, 1.8, 1.6, 1.8, 1.4, 3.8, 1.6, 1, 2, 2.2, 1, 1, 2, 1, 1, 
2.2, 1, 1.4, 1.8, 2, 2.4, 1.8, 1, 2, 2.2, 1.6, 2.2, 1.2, 2, 2, 
1, 2, 2, 2, 2, 1.8, 1.4, 1.6, 1, 2.2, 1.4, 1.6, 2.4, 1.2, 1.4, 
2.4, 2, 2.4, 2, 1.8, 1.6, 1, 1.2, 1.8, 2, 3.4, 2, 2, 1, 2, 1, 
1.4, 2, 2.8, 2, 1, 2, 1.2, 1.4, 2, 1.6, 1.4, 2.2, 2.4, 1.8, 1.4, 
1.4, 2, 2.2, 1, 2, 2.2, 1, 1.8, 2.2, 1.6, 1.2, 1, 2, 1, 1.2, 
2.6, 1.8, 1.8, 1.8), DISORDER = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, NA, 1L, 1L, 1L, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 2L, 1L, 1L, 
1L, 1L, 2L, 1L, 1L, 1L, NA, NA, 1L, 1L, 1L, 2L, 1L, 1L, NA, 2L, 
NA, 1L, 1L, 1L, NA, 1L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 2L, 1L, NA, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, NA, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L, NA, NA, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, NA, 1L, 1L, 1L, 1L, 1L, NA, 2L, NA, 2L
), .Label = c("0", "1"), class = "factor"), Scho_socialdep1000_3 = structure(c(3L, 
3L, 3L, 3L, 3L, 2L, NA, 3L, 2L, 1L, 2L, 1L, NA, 2L, NA, NA, 1L, 
2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, NA, NA, 1L, 2L, 2L, 2L, 
1L, 1L, 2L, 3L, NA, 1L, 1L, 2L, NA, 2L, 2L, 1L, 2L, NA, 1L, 3L, 
2L, 1L, 1L, NA, 3L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 
2L, 1L, 1L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, NA, 1L, 1L, 
1L, NA, NA, 1L, 3L, 3L, 2L, 3L, 2L, 1L, 2L, 1L, 1L, 3L, 2L, 1L, 
NA, 2L, NA, 3L), .Label = c("0", "1", "2"), class = "factor")), row.names = 100:200, class = "data.frame")

And my cluster code:

dist <- daisy(dt, metric = "gower")
cls.env <- hclust(dist, method = "average")

Original Q&A

There are 1 best solutions below

**danlooo** · Accepted Answer · 2022-03-15T08:27:14.003000

hclust returns just a dendrogram representing clusters inside clusters. The silhouette score is defined on one specific clustering and not on all possible clusterings. This is why it works with Partitioning Arround Medoids (PAM) out of the box. In hierarchical clustering, however, one needs to decide first which clustering to chose by cutting dendrogram tree.

This is how to plot the silhouettes for a hierarchical clustering using k=5 clusters:

library(tidyverse)
library(cluster)

data <- structure(list(safe = c(
  1.5, 1.5, 2, 1, 1, 1, 2, 1.5, 1, 1, 1.5,
  1, 2, 1, 3.5, 1, 1, 1.5, 1.5, 1, 1, 1, 1, 2, 1.5, 1, 1.5, 1,
  1, 1.5, 2.5, 1, 2, 1, 1.5, 1, 1.5, 1, 1, 1, 2.5, 2, 1, 1, 1,
  2, 1, 1.5, 1.5, 1.5, 1.5, 3, 3, 1, 1.5, 2, 1, 1.5, 1, 1.5, 1,
  1.5, 1, 1, 1, 1.5, 1, 1, 1, 1.5, 1.5, 1.5, 1, 2, 1, 1, 1, 1,
  1, 1, 1.5, 1, 1, 2, 1, 1, 1, 1.5, 1, 1.5, 1.5, 1, 1, 1, 1, 1.5,
  1, 2, 1.5, 1, 1
), nhood.soccapi = c(
  1.8, 1.6, 2.8, 2.2, 2, 3.6,
  3.2, 1.8, 1.6, 1.8, 1.4, 3.8, 1.6, 1, 2, 2.2, 1, 1, 2, 1, 1,
  2.2, 1, 1.4, 1.8, 2, 2.4, 1.8, 1, 2, 2.2, 1.6, 2.2, 1.2, 2, 2,
  1, 2, 2, 2, 2, 1.8, 1.4, 1.6, 1, 2.2, 1.4, 1.6, 2.4, 1.2, 1.4,
  2.4, 2, 2.4, 2, 1.8, 1.6, 1, 1.2, 1.8, 2, 3.4, 2, 2, 1, 2, 1,
  1.4, 2, 2.8, 2, 1, 2, 1.2, 1.4, 2, 1.6, 1.4, 2.2, 2.4, 1.8, 1.4,
  1.4, 2, 2.2, 1, 2, 2.2, 1, 1.8, 2.2, 1.6, 1.2, 1, 2, 1, 1.2,
  2.6, 1.8, 1.8, 1.8
), DISORDER = structure(c(
  1L, 1L, 1L, 1L, 1L,
  1L, NA, 1L, 1L, 1L, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 2L, 1L, 1L,
  1L, 1L, 2L, 1L, 1L, 1L, NA, NA, 1L, 1L, 1L, 2L, 1L, 1L, NA, 2L,
  NA, 1L, 1L, 1L, NA, 1L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 2L, 1L, NA,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, NA, 1L, 1L, 1L,
  2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L, NA, NA, 1L,
  1L, 2L, 2L, 2L, 1L, 1L, NA, 1L, 1L, 1L, 1L, 1L, NA, 2L, NA, 2L
), .Label = c("0", "1"), class = "factor"), Scho_socialdep1000_3 = structure(c(
  3L,
  3L, 3L, 3L, 3L, 2L, NA, 3L, 2L, 1L, 2L, 1L, NA, 2L, NA, NA, 1L,
  2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, NA, NA, 1L, 2L, 2L, 2L,
  1L, 1L, 2L, 3L, NA, 1L, 1L, 2L, NA, 2L, 2L, 1L, 2L, NA, 1L, 3L,
  2L, 1L, 1L, NA, 3L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L,
  2L, 1L, 1L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, NA, 1L, 1L,
  1L, NA, NA, 1L, 3L, 3L, 2L, 3L, 2L, 1L, 2L, 1L, 1L, 3L, 2L, 1L,
  NA, 2L, NA, 3L
), .Label = c("0", "1", "2"), class = "factor")), row.names = 100:200, class = "data.frame")



dist <- daisy(data, metric = "gover")
#> Warning in daisy(data, metric = "gover"): with mixed variables, metric "gower"
#> is used automatically
dendro <-
  dist %>%
  hclust(method = "average") %>%
  # need to put the elements into mutually exclusive sets of clusters
  cutree(k = 5)

silhouette(dendro, dist) %>%
  plot()

Please note that the choice of k is arbitrary. You will get different results selecting another value.

^{Created on 2022-03-15 by the reprex package (v2.0.0)}

Silhouette plot from dendrogram in R

There are 1 best solutions below

Related Questions in R

Related Questions in SILHOUETTE

Trending Questions

Popular # Hahtags

Popular Questions