Is there a way to shorten long intergenic regions in gggenes?

64 Views Asked by At

I'm visualizing a series of gene clusters with gggenes, and was wondering if there is a built-in functionality or workaround to shorten long, unannotated regions between genes of interest. Here's the set of genes giving me grief:

df<-data.frame(start=c(594198,596540,598457,600085,983488,984345),
stop=c(596450,598423,600070,601182,984336,986495),
species=rep("Ferriphaselus amnicola",6),
gene=c("gene1","gene2","gene3","gene4","gene5","gene6"))

When you use gggenes to visualize this, you get something not so pretty:

ggplot(df, aes(xmin = start, xmax = stop, y = species, fill = gene)) +
    geom_gene_arrow() +
    facet_wrap(~ species, scales = "free", ncol = 1) +
    scale_fill_brewer(palette = "Set3") +
    theme_genes()

enter image description here

Ideally, I"d be able to tell gggenes that when there are more than x number of nucleotides between 2 genes, replace that span of genome with two slashes (as is customary in the literature). I'm imagining something like this edit I cobbled together in powerpoint: enter image description here

Is there a straightforward way to do this in gggenes, or even in another package?

Thank you!

3

There are 3 best solutions below

0
On

Searching Google for an axis-breaking option brought up the ggbreak package:

install.packages("ggbreak")
library(ggbreak)

It took a bit of fiddling to narrow down the proper breakpoints but this looks promising. Assuming you assigned your plot object to the name plt:

library(ggbreak); (plt2 <- plt + scale_x_break(c(6.05e+05, 9.8e+05)) )

enter image description here

After the first instance of fiddling I started getting warnings, but plots were still produced.

In Summary.unit(list(list(0, NULL, 8L), list(1, list(list(1, list( : reached elapsed time limit

0
On

This one is really hacky and with a lot of code by hand. But as a starting point it could help:

library(dplyr)
library(ggplot2)
library(gggenes)
library(ggpubr)
library(gridExtra)
library(grid)

# Define common components
arrow_geom <- list(
  geom_gene_arrow(
    arrowhead_height = unit(12, "mm"),
    arrowhead_width = unit(6, "mm"),
    arrow_body_height = unit(6, "mm")
  ),
  geom_gene_label(aes(label = gene), height = unit(6, "mm"), grow = TRUE)
)

common_theme <- theme_classic() +
  theme(
    text = element_text(size=20),
    plot.margin = margin(0, 0, 0, 0, "cm")
  )

# Plot with legend
p0 <- df %>%
  ggplot(aes(xmin = start, xmax = stop, y = species, fill = gene)) +
  arrow_geom +
  facet_wrap(~ species, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") +
  common_theme

# Extract the legend
legend <- get_legend(p0)

# Plot for gene1 to gene4
p1 <- df %>%
  mutate(median_stop = median(stop)) %>%
  filter(stop > (median_stop - 10000) & stop < (median_stop + 10000)) %>%
  ggplot(aes(xmin = start, xmax = stop, y = species, fill = gene)) +
  arrow_geom +
  facet_wrap(~ species, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") +
  labs(y="") +
  guides(fill = "none") +
  common_theme

# Plot for gene5 to gene6
p2 <- df %>%
  mutate(x = median(stop)) %>%
  filter(stop > x + 100000) %>%
  mutate(species = "") %>%
  ggplot(aes(xmin = start, xmax = stop, y = species, fill = gene)) +
  arrow_geom +
  facet_wrap(~ species, scales = "free", ncol = 1) +
  scale_fill_manual(values = c("lightblue", "orange")) +
  labs(y = "") +
  guides(fill = "none") +
  common_theme +
  theme(
    axis.line.y = element_blank(),
    axis.ticks.y = element_blank(),
    strip.background = element_blank()
  )


dev.off()

# set the margins and combine all
p1 <- p1 + theme(plot.margin = margin(0, 0, 0, 0, "cm")) 
p2 <- p2 + theme(plot.margin = margin(0, 0, 0, 0, "cm"))
g <- arrangeGrob(
  p1, p2, legend,
  nrow = 1,
  widths = c(4, 2, 1)  
)
grid.draw(g)

enter image description here

0
On

I'm the author of gggenes. You can't do this within gggenes. However, you can still achieve this with ggplot2 in a few different ways:

  • You could generate separate plots for each CDS region and compose them with the patchwork package
  • As IRTFM suggested, you could use the ggbreak package to insert axis breaks, although note that broken axes are somewhat frowned upon as they distort the mapping between data and plot aesthetics and can be misleading
  • I wrote a package called ggwrap which takes a wide plot and wraps it over multiple rows, originally intended for plotting long sequences with gggenes