I am trying to plot a correlation matrix that includes thousands of pairwise comparisons. I am thinking to use ggplot2 in R to plot it out. There are 4 main issues would like to address (some of them have been addressed, but I can amend them if the proposed method requires specific pre-requisites. I am listing them here, so to ensure the final solution is compatible with them).

  1. I will need to plot a remix of the correlation matrix with the upper triangle and lower triangle are from two matrixes (This has been done)
  2. I would like to plot the y-axis in reverse order so that the upper triangle will be at the top-right (This has been done)
  3. Some of the entries are clustered into modules, and I would like to add a rectangle to annotate them. (This has been done)
  4. For each cluster, I would like to annotate the cluster with text. In case of having labels clustered together, I have also included the repel text (This has been done)
  5. The length of some annotations can be very long. Rather than wrap the text over several lines, I am thinking to write them all in one line. However, the plot size seems fixed (at least in Rstudio) and part of the long strings was trimmed. May I know if that is possible to force ggplot output the full annotations?

below are the code for the toy dataset and my current approaches

data input:

rownames(M1) <- c("s1", "s2", "s3", "s4", "s5", "s6")
colnames(M1) <- c("g1", "g2", "g3", "g4", "g5", "g6")
rownames(M2) <- c("s1", "s2", "s3", "s4", "s5", "s6")
colnames(M2) <- c("g1", "g2", "g3", "g4", "g5", "g6")
M3 <- M1
diag(M3) <- NA
M3[upper.tri(M3)] <- M2[upper.tri(M2)]

cluster_annotation <- data.frame(cluster = c("c3", "c2", "c1"),
                                 cluster_anno = c("This is 3", "This is 2", "This is 111111111111111111111111This text has been cut"))

forgeneorder <- c("g1", "g4", "g5", "g3", "g6", "g2")
forsampleorder <- c("s1", "s4", "s5", "s3", "s6", "s2")
annotation_dataset <- data.frame(gene =    c("g1", "g2", "g3", "g4", "g5", "g6"),
                                 cluster = c("c2", "c3", "c1", "c2", "c2", "c3"))

My current trials:

 annotation_data <- annotation_dataset %>% 
  as_tibble() %>% 
  mutate(gene = factor(gene,  levels = !!rev(forgeneorder))) %>% 
  arrange(desc(gene)) %>% 
  mutate(geneorder = rev(row_number())) %>% 
  group_by(cluster) %>% 
  mutate(cluster_order = rev(row_number()),
         cluster_min = min(cluster_order),
         cluster_max = max(cluster_order),
         cluster_middle = mean(geneorder)) %>% 
  filter(cluster_order == cluster_min | cluster_order == cluster_max) %>% 
  ungroup() %>% 
  mutate(vertexes = ifelse(cluster_order == cluster_min, geneorder - 0.5, geneorder + 0.5),
         positions = ifelse(cluster_order == cluster_min, "bottumleft", "topright"),
         maxgene = max(geneorder)) %>% 
  dplyr::select(-cluster_order, -cluster_min, -cluster_max, -geneorder, -gene) %>% 
  spread(positions, vertexes) %>% 
  left_join(cluster_annotation, by = "cluster") %>% 
  mutate(bottumright = maxgene - bottumleft + 1,
         topleft = maxgene - topright + 1)

as_tibble(M3, rownames = "sample") %>% 
  gather(gene, correlation, -sample) %>% 
  mutate(gene = factor(gene,  levels = !!rev(forgeneorder)),
         sample = factor(sample,  levels = !!forsampleorder)) %>% 
  ggplot() + 
  geom_tile(aes(x = sample, y = gene, fill = correlation)) +
  with(annotation_data, annotate(geom = "rect", fill = "transparent", color = "black", size = 1.5,
           xmin = topleft, ymin = bottumleft, xmax = bottumright, ymax = topright))+
  # with(annotation_data, annotate(geom = "text", color = "black", size = maxgene*1.2,
  #          x= maxgene + 0.75, y = cluster_middle, label = cluster_anno, hjust = 0))+
  geom_text_repel(data = annotation_data,
                  aes(x= maxgene + 0.75, y = cluster_middle, label = cluster_anno), 
                  direction = "y", 
                  hjust = 0, 
                  segment.size = 0.2,
                  na.rm = TRUE,
                  xlim = c(NA, Inf)
                  ) +
  scale_fill_gradient(low = "red", high = "green") +
  coord_equal(clip = "off") +
  theme_classic() +
  theme(axis.text = element_blank(),
       axis.line = element_blank(),
       axis.ticks = element_blank(),
       axis.title = element_blank(),
       legend.position = "top")

the current output:

One approach would be to use the "secondary axis trick" instead of adding the labels via geom_text_repel. As a discrete scale does not allow for a secondary axis this requires to convert your gene variable to a numeric so that one can make use of a continuous scale. And as you removed the axes completely we have to add the axis text for the secondary scale using theme(..., axis.text.y.right = element_text()):


m3 <- as_tibble(M3, rownames = "sample") %>% 
  gather(gene, correlation, -sample) %>% 
  mutate(gene = factor(gene,  levels = !!rev(forgeneorder)),
         sample = factor(sample,  levels = !!forsampleorder))

ggplot(m3) + 
  geom_tile(aes(x = sample, y = as.numeric(gene), fill = correlation)) +
  with(annotation_data, annotate(geom = "rect", fill = "transparent", color = "black", size = 1.5,
                                 xmin = topleft, ymin = bottumleft, xmax = bottumright, ymax = topright)) +
  scale_y_continuous(sec.axis = dup_axis(breaks = annotation_data$cluster_middle,
                                         labels = annotation_data$cluster_anno)) +
  scale_fill_gradient(low = "red", high = "green") +
  coord_equal(clip = "off") +
  theme_classic() +
  theme(axis.text = element_blank(),
        axis.line = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        axis.text.y.right = element_text(),
        legend.position = "top")

This approach also works if you want to have the labels on the left as you mentioned in your comment. In that case we simply have to position the y axis on the right using scale_y_continuous(..., position="right") and add the axis text for the secondary scale via axis.text.y.left = element_text():

