Add labels in dendogram in R

129 Views Asked by At

I am trying to apply Hierarchical Clustering for Time Series in order to identify the states with similar behaviors in the time series for residential_percent_change_from_baseline. I get the dendrogram but the index i get in the x axis are just numbers and I want the states names. my data looks like this: Data

And this is some part of my code

data <- dataset
#Convert to factor
cols <- c("country_region_code", "country_region", "sub_region_1", "iso_3166_2_code")
data[cols] <- lapply(data[cols], factor)
sapply(data, class)
data$date <- as.Date(data$date)
summary(data)

#Data preparation
n <- 10
s <- sample(1:100, n)
i <- c(s,0+s,   279+s,  556+s,  833+s,  1110+s, 1387+s, 1664+s, 1941+s, 2218+s, 2495+s, 2772+s, 3049+s, 3326+s, 3603+s, 3880+s, 4157+s, 4434+s, 4711+s, 4988+s, 5265+s, 5542+s, 5819+s, 6096+s, 6373+s, 6650+s, 6927+s, 7204+s, 7481+s, 7758+s, 8035+s, 8312+s, 8589+s, 8866+s)
d <- data[i,3:4]
d$residential <- data[i,11]
d[,2] =NULL
str(d)

pattern <- c(rep('Mexico', n),
             rep('Aguascalientes', n),
             rep('Baja California',n),
             rep('Baja California Sur',n),
             rep('Campeche',n),
             rep('Coahuila',n),
             rep('Colima',n),
             rep('Chiapas',n),
             rep('Chihuahua',n),
             rep('Durango',n),
             rep('Guanajuato',n),
             rep('Guerrero',n),
             rep('Hidalgo',n),
             rep('Jalisco',n),
             rep('México City',n),
             rep('Michoacan',n),
             rep('Morelos',n),
             rep('Nayarit',n),
             rep('Nuevo León',n),
             rep('Oaxaca',n),
             rep('Puebla',n),
             rep('Querétaro',n),
             rep('Quintana Roo',n),
             rep('San Luis Potosí',n),
             rep('Sinaloa',n),
             rep('Sonora',n), 
             rep('Tabasco',n),
             rep('Tamaulipas',n),
             rep('Tlaxcala',n),
             rep('Veracruz',n),
             rep('Yucatán',n),
             rep('Zacatecas.',n))
d <- data.matrix(d)
distance <- dist(d, method = 'euclidean')
hc <- hclust(distance, method="ward.D")
plot(hc, cex=.7, hang = -1, col='blue', labels=pattern)

I get this dendrogram when I don't specify labels dendrogram with numeric labels But when I do I get this error

Error in graphics:::plotHclust(n1, merge, height, order(x$order), hang, : invalid dendrogram input

I hope somebody can help me, I am little bit tired of this

1

There are 1 best solutions below

2
On

Maybe it will work with an alternative to the base r plot function. Try ggdendroplot. It should display the labels on the axis. You will need ggplot2 for this.

devtools::install("nicolash2/ggdendroplot")
library(ggdendroplot)
library(ggplot2)

ggplot() + geom_dendro(hc)

If you want to modify it (turn it, color it, etc.) check out the github page: https://github.com/NicolasH2/ggdendroplot