force ggplot to plot and includes all the annotation that is outside the plot in r

207 Views Asked by At

I am trying to plot a correlation matrix that includes thousands of pairwise comparisons. I am thinking to use ggplot2 in R to plot it out. There are 4 main issues would like to address (some of them have been addressed, but I can amend them if the proposed method requires specific pre-requisites. I am listing them here, so to ensure the final solution is compatible with them).

  1. I will need to plot a remix of the correlation matrix with the upper triangle and lower triangle are from two matrixes (This has been done)
  2. I would like to plot the y-axis in reverse order so that the upper triangle will be at the top-right (This has been done)
  3. Some of the entries are clustered into modules, and I would like to add a rectangle to annotate them. (This has been done)
  4. For each cluster, I would like to annotate the cluster with text. In case of having labels clustered together, I have also included the repel text (This has been done)
  5. The length of some annotations can be very long. Rather than wrap the text over several lines, I am thinking to write them all in one line. However, the plot size seems fixed (at least in Rstudio) and part of the long strings was trimmed. May I know if that is possible to force ggplot output the full annotations?

below are the code for the toy dataset and my current approaches

data input:

set.seed(1234)
M1<-matrix(rnorm(36)*3,nrow=6)
rownames(M1) <- c("s1", "s2", "s3", "s4", "s5", "s6")
colnames(M1) <- c("g1", "g2", "g3", "g4", "g5", "g6")
set.seed(2345)
M2<-matrix(rnorm(36),nrow=6)
rownames(M2) <- c("s1", "s2", "s3", "s4", "s5", "s6")
colnames(M2) <- c("g1", "g2", "g3", "g4", "g5", "g6")
M3 <- M1
diag(M3) <- NA
M3[upper.tri(M3)] <- M2[upper.tri(M2)]

cluster_annotation <- data.frame(cluster = c("c3", "c2", "c1"),
                                 cluster_anno = c("This is 3", "This is 2", "This is 111111111111111111111111This text has been cut"))

forgeneorder <- c("g1", "g4", "g5", "g3", "g6", "g2")
forsampleorder <- c("s1", "s4", "s5", "s3", "s6", "s2")
annotation_dataset <- data.frame(gene =    c("g1", "g2", "g3", "g4", "g5", "g6"),
                                 cluster = c("c2", "c3", "c1", "c2", "c2", "c3"))

My current trials:

 annotation_data <- annotation_dataset %>% 
  as_tibble() %>% 
  mutate(gene = factor(gene,  levels = !!rev(forgeneorder))) %>% 
  arrange(desc(gene)) %>% 
  mutate(geneorder = rev(row_number())) %>% 
  group_by(cluster) %>% 
  mutate(cluster_order = rev(row_number()),
         cluster_min = min(cluster_order),
         cluster_max = max(cluster_order),
         cluster_middle = mean(geneorder)) %>% 
  filter(cluster_order == cluster_min | cluster_order == cluster_max) %>% 
  ungroup() %>% 
  mutate(vertexes = ifelse(cluster_order == cluster_min, geneorder - 0.5, geneorder + 0.5),
         positions = ifelse(cluster_order == cluster_min, "bottumleft", "topright"),
         maxgene = max(geneorder)) %>% 
  dplyr::select(-cluster_order, -cluster_min, -cluster_max, -geneorder, -gene) %>% 
  spread(positions, vertexes) %>% 
  left_join(cluster_annotation, by = "cluster") %>% 
  mutate(bottumright = maxgene - bottumleft + 1,
         topleft = maxgene - topright + 1)


as_tibble(M3, rownames = "sample") %>% 
  gather(gene, correlation, -sample) %>% 
  mutate(gene = factor(gene,  levels = !!rev(forgeneorder)),
         sample = factor(sample,  levels = !!forsampleorder)) %>% 
  ggplot() + 
  geom_tile(aes(x = sample, y = gene, fill = correlation)) +
  with(annotation_data, annotate(geom = "rect", fill = "transparent", color = "black", size = 1.5,
           xmin = topleft, ymin = bottumleft, xmax = bottumright, ymax = topright))+
  # with(annotation_data, annotate(geom = "text", color = "black", size = maxgene*1.2,
  #          x= maxgene + 0.75, y = cluster_middle, label = cluster_anno, hjust = 0))+
  geom_text_repel(data = annotation_data,
                  aes(x= maxgene + 0.75, y = cluster_middle, label = cluster_anno), 
                  direction = "y", 
                  hjust = 0, 
                  segment.size = 0.2,
                  na.rm = TRUE,
                  xlim = c(NA, Inf)
                  ) +
  scale_fill_gradient(low = "red", high = "green") +
  coord_equal(clip = "off") +
  theme_classic() +
  theme(axis.text = element_blank(),
       axis.line = element_blank(),
       axis.ticks = element_blank(),
       axis.title = element_blank(),
       legend.position = "top")

the current output:

enter image description here

1

There are 1 best solutions below

0
On

One approach would be to use the "secondary axis trick" instead of adding the labels via geom_text_repel. As a discrete scale does not allow for a secondary axis this requires to convert your gene variable to a numeric so that one can make use of a continuous scale. And as you removed the axes completely we have to add the axis text for the secondary scale using theme(..., axis.text.y.right = element_text()):

library(ggplot2)

m3 <- as_tibble(M3, rownames = "sample") %>% 
  gather(gene, correlation, -sample) %>% 
  mutate(gene = factor(gene,  levels = !!rev(forgeneorder)),
         sample = factor(sample,  levels = !!forsampleorder))

ggplot(m3) + 
  geom_tile(aes(x = sample, y = as.numeric(gene), fill = correlation)) +
  with(annotation_data, annotate(geom = "rect", fill = "transparent", color = "black", size = 1.5,
                                 xmin = topleft, ymin = bottumleft, xmax = bottumright, ymax = topright)) +
  scale_y_continuous(sec.axis = dup_axis(breaks = annotation_data$cluster_middle,
                                         labels = annotation_data$cluster_anno)) +
  scale_fill_gradient(low = "red", high = "green") +
  coord_equal(clip = "off") +
  theme_classic() +
  theme(axis.text = element_blank(),
        axis.line = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        axis.text.y.right = element_text(),
        legend.position = "top")

enter image description here

This approach also works if you want to have the labels on the left as you mentioned in your comment. In that case we simply have to position the y axis on the right using scale_y_continuous(..., position="right") and add the axis text for the secondary scale via axis.text.y.left = element_text():

enter image description here