Ordering colors on colored bar for dendrogram in R

156 Views Asked by At

The vignette for the R package dendextend (https://cran.r-project.org/web/packages/dendextend/vignettes/dendextend.html) gives an example of using the colored_bars function with cutreeDynamic from package dynamicTreeCut as follows:

# let's get the clusters
library(dynamicTreeCut)
data(iris)
x  <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram 

# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)

library(colorspace)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>% 
         branches_attr_by_clusters(clusters, values = cols) %>% 
         color_labels(col =   true_species_cols)
plot(dend2)
clusters <- factor(clusters)
levels(clusters)[-1]  <- cols[-5][c(1,4,2,3)] 
   # Get the clusters to have proper colors.
   # fix the order of the colors to match the branches.

colored_bars(clusters, dend, sort_by_labels_order = FALSE)

The following line reorders the colors to match the branches:

levels(clusters)[-1]  <- cols[-5][c(1,4,2,3)] 

I wish to apply this method to my own data which has many more clusters, but I am unclear on how the revised ordering of the colors was determined. This example uses a custom ordering for the iris data. Can anyone explain how this order was determined and is there a way to automate this?

1

There are 1 best solutions below

1
Andy On

Just for starters, your example code above from the data(iris)was missing two necessary packages, library(dplyr) to be able to use the pipe command %>% and library(dendextend) for the label colors, from color_lables()

In order to answer your question, solution can be found in the levels(clusters)[-1] <- cols[-5][c(1,4,3,2)] section of code. As you mention, this is custom to this specific dataset, but I am unaware of why the authors picked this specific order. If you do not set the order, and want R to automatically do it, than in the colored_bars() command, the sort_by_labels_order=TRUE must be set. Here, it is set to FALSE since the authors use a custom order.

If it is set to TRUE, than I cite directly from R "the colors vector/matrix should be provided in the order of the original data order (and it will be re-ordered automatically to the order of the dendrogram)". For more information, see ?colored_bars()

This will show you the difference betweeen the two parameters, when set to FALSE or TRUE.

# let's get the clusters
library(dynamicTreeCut)
library(dplyr)
data(iris)
x  <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram 

# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)

library(colorspace)
library(dendextend)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>% 
  branches_attr_by_clusters(clusters, values = cols) %>% 
  color_labels(col =   true_species_cols)
clusters <- factor(clusters)
levels(clusters)[-1]  <- cols[-5][c(1,4,2,3)] 
plot(dend2);colored_bars(clusters, dend, sort_by_labels_order = FALSE)

# here R automatically assigned the colors
plot(dend2);colored_bars(clusters, dend, sort_by_labels_order = TRUE)