Adding vertical gene connection lines and improving color palette in ggplot2 and gggene

61 Views Asked by At

I am currently working on visualising a gene dataset using the ggplot2 and gggene package in R to showcase the conservation of gene neighbourhood or synteny across evolution. I have successfully created a gene plot with arrows representing genes, labels indicating their symbols and orientation depicts the direction of arrow. However, I am now facing two challenges that I would like some help with.

1. Adding Vertical Lines Connecting Genes with the Same Symbol: I have a dataset that contains information about genes, including their symbols, start and stop positions, and orientation. Here's a snippet of the dataset present as data.csv:

species symbol  start   stop    orientation
Homo_sapiens    SLC35A1 1   2   1
Homo_sapiens    RARS2   2   3   0
Homo_sapiens    ORC3    3   4   1
Homo_sapiens    AKIRIN2 4   5   0
Homo_sapiens    SPACA1  5   6   1
Homo_sapiens    CNR1    6   7   0
Homo_sapiens    RNGTT   7   8   0
Homo_sapiens    PNRC1   8   9   1
Homo_sapiens    PM20D2  9   10  1
Homo_sapiens    SRSF12  10  11  0
Homo_sapiens    GABRR1  11  12  0
Mus_musculus    GABRR1  1   2   1
Mus_musculus    PM20D2  2   3   0
Mus_musculus    SRSF12  3   4   1
Mus_musculus    PNRC1   4   5   0
Mus_musculus    RNGTT   5   6   1
Mus_musculus    CNR1    6   7   1
Mus_musculus    SPACA1  7   8   0
Mus_musculus    AKIRIN2 8   9   1
Mus_musculus    ORC3    9   10  0
Mus_musculus    RARS2   10  11  1
Mus_musculus    SLC35A1 11  12  0
Rattus_norvegicus   GABRR1  1   2   1
Rattus_norvegicus   PM20D2  2   3   0
Rattus_norvegicus   SRSF12  3   4   1
Rattus_norvegicus   PNRC1   4   5   0
Rattus_norvegicus   RNGTT   5   6   1
Rattus_norvegicus   CNR1    6   7   1
Rattus_norvegicus   SPACA1  7   8   0
Rattus_norvegicus   AKIRIN2 8   9   1
Rattus_norvegicus   ORC3    9   10  0
Rattus_norvegicus   RARS2   10  11  1
Rattus_norvegicus   SLC35A1 11  12  0
Canis_lupus_familiaris  SLC35A1 1   2   1
Canis_lupus_familiaris  RARS2   2   3   0
Canis_lupus_familiaris  ORC3    3   4   1
Canis_lupus_familiaris  AKIRIN2 4   5   0
Canis_lupus_familiaris  SPACA1  5   6   1
Canis_lupus_familiaris  CNR1    6   7   0
Canis_lupus_familiaris  RNGTT   7   8   0
Canis_lupus_familiaris  PNRC1   8   9   1
Canis_lupus_familiaris  SRSF12  9   10  0
Canis_lupus_familiaris  PM20D2  10  11  1
Canis_lupus_familiaris  GABRR1  11  12  0
Monodelphis_domestica   SLC35A1 1   2   1
Monodelphis_domestica   RARS2   2   3   0
Monodelphis_domestica   ORC3    3   4   1
Monodelphis_domestica   AKIRIN2 4   5   0
Monodelphis_domestica   SPACA1  5   6   1
Monodelphis_domestica   CNR1    6   7   0
Monodelphis_domestica   RNGTT   7   8   0
Monodelphis_domestica   PNRC1   8   9   1
Monodelphis_domestica   SRSF12  9   10  0
Monodelphis_domestica   PM20D2  10  11  1
Monodelphis_domestica   GABRR1  11  12  0
Ornithorhynchus_anatinus    SLC35A1 1   2   1
Ornithorhynchus_anatinus    RARS2   2   3   0
Ornithorhynchus_anatinus    ORC3    3   4   1
Ornithorhynchus_anatinus    AKIRIN2 4   5   0
Ornithorhynchus_anatinus    SPACA1  5   6   1
Ornithorhynchus_anatinus    CNR1    6   7   0
Ornithorhynchus_anatinus    RNGTT   7   8   0
Ornithorhynchus_anatinus    PNRC1   8   9   1
Ornithorhynchus_anatinus    PM20D2  9   10  1
Ornithorhynchus_anatinus    LOC100076186    10  11  0
Ornithorhynchus_anatinus    LOC114805750    11  12  1
Gallus_gallus   PM20D2  1   2   0
Gallus_gallus   PNRC1   2   3   0
Gallus_gallus   BORCS6  3   4   1
Gallus_gallus   RNGTT   4   5   1
Gallus_gallus   LOC101749895    5   6   1
Gallus_gallus   CNR1    6   7   1
Gallus_gallus   SPACA1  7   8   0
Gallus_gallus   AKIRIN2 8   9   1
Gallus_gallus   ORC3    9   10  0
Gallus_gallus   RARS2   10  11  1
Gallus_gallus   SLC35A1 11  12  0
Taeniopygia_guttata CFAP206 1   2   1
Taeniopygia_guttata SLC35A1 2   3   1
Taeniopygia_guttata RARS2   3   4   0
Taeniopygia_guttata ORC3    4   5   1
Taeniopygia_guttata AKIRIN2 5   6   0
Taeniopygia_guttata CNR1    6   7   0
Taeniopygia_guttata RNGTT   7   8   0
Taeniopygia_guttata BORCS6  8   9   0
Taeniopygia_guttata PNRC1   9   10  1
Taeniopygia_guttata PM20D2  10  11  1
Taeniopygia_guttata GABRR1  11  12  0
Chelonia_mydas  SLC35A1 1   2   1
Chelonia_mydas  RARS2   2   3   0
Chelonia_mydas  ORC3    3   4   1
Chelonia_mydas  AKIRIN2 4   5   0
Chelonia_mydas  SPACA1  5   6   1
Chelonia_mydas  CNR1    6   7   0
Chelonia_mydas  RNGTT   7   8   0
Chelonia_mydas  LOC102938330    8   9   0
Chelonia_mydas  PNRC1   9   10  1
Chelonia_mydas  PM20D2  10  11  1
Chelonia_mydas  GABRR1  11  12  0
Anolis_carolinensis PM20D2  1   2   0
Anolis_carolinensis SRSF12  2   3   1
Anolis_carolinensis PNRC1   3   4   0
Anolis_carolinensis RNGTT   4   5   1
Anolis_carolinensis LOC107982676    5   6   0
Anolis_carolinensis CNR1    6   7   1
Anolis_carolinensis SPACA1  7   8   0
Anolis_carolinensis AKIRIN2 8   9   1
Anolis_carolinensis ORC3    9   10  0
Anolis_carolinensis RARS2   10  11  1
Anolis_carolinensis SLC35A1 11  12  0
Xenopus_laevis  GABRR2.S    1   2   1
Xenopus_laevis  GABRR1.S    2   3   1
Xenopus_laevis  PM20D2.S    3   4   0
Xenopus_laevis  LOC108717975    4   5   1
Xenopus_laevis  RNGTT.S 5   6   1
Xenopus_laevis  CNR1.S  6   7   1
Xenopus_laevis  AKIRIN2.S   7   8   1
Xenopus_laevis  ORC3.S  8   9   0
Xenopus_laevis  RARS2.S 9   10  1
Xenopus_laevis  SLC35A1.S   10  11  0
Xenopus_laevis  LOC108717977    11  12  1
Latimeria_chalumnae DDX24   1   2   0
Latimeria_chalumnae PPP4R4  2   3   1
Latimeria_chalumnae SERPINA10B  3   4   0
Latimeria_chalumnae ARRDC3A 4   5   1
Latimeria_chalumnae LOC102360869    5   6   0
Latimeria_chalumnae CNR1    6   7   1
Latimeria_chalumnae SPACA1  7   8   0
Latimeria_chalumnae AKIRIN2 8   9   1
Latimeria_chalumnae ORC3    9   10  0
Latimeria_chalumnae RARS2   10  11  1
Latimeria_chalumnae LOC102362557    11  12  1
Protopterus_annectens   LOC122794922    1   2   1
Protopterus_annectens   LOC122794923    2   3   1
Protopterus_annectens   LOC122794924    3   4   1
Protopterus_annectens   FBXL5   4   5   1
Protopterus_annectens   CC2D2A  5   6   0
Protopterus_annectens   CNR1    6   7   1
Protopterus_annectens   CPEB2   7   8   0
Protopterus_annectens   BOD1L1  8   9   0
Protopterus_annectens   C1QTNF7 9   10  0
Protopterus_annectens   NKX3-2  10  11  1
Protopterus_annectens   RAB28   11  12  1
Danio_rerio MYO6A   1   2   1
Danio_rerio LOC569340   2   3   0
Danio_rerio MEI4    3   4   1
Danio_rerio NT5E    4   5   1
Danio_rerio SNX14   5   6   0
Danio_rerio CNR1    6   7   0
Danio_rerio RNGTT   7   8   0
Danio_rerio PNRC1   8   9   1
Danio_rerio GABRR1  9   10  0
Danio_rerio GABRR2B 10  11  0
Danio_rerio UBE2J1  11  12  0
Oreochromis_niloticus   SI:DKEY-174M14.3    1   2   1
Oreochromis_niloticus   RDH14B  2   3   0
Oreochromis_niloticus   LOC102078481    3   4   1
Oreochromis_niloticus   RNGTT   4   5   1
Oreochromis_niloticus   LOC112842425    5   6   0
Oreochromis_niloticus   CNR1    6   7   1
Oreochromis_niloticus   AKIRIN2 7   8   1
Oreochromis_niloticus   RARS2   8   9   1
Oreochromis_niloticus   SLC35A1 9   10  0
Oreochromis_niloticus   LOC100692709    10  11  0
Oreochromis_niloticus   LOC102081816    11  12  1
Scyliorhinus_canicula   SLC35A1 1   2   1
Scyliorhinus_canicula   RARS2   2   3   0
Scyliorhinus_canicula   ORC3    3   4   1
Scyliorhinus_canicula   AKIRIN2 4   5   0
Scyliorhinus_canicula   LOC119967921    5   6   1
Scyliorhinus_canicula   CNR1    6   7   0
Scyliorhinus_canicula   RNGTT   7   8   0
Scyliorhinus_canicula   LOC119967175    8   9   0
Scyliorhinus_canicula   PNRC1   9   10  1
Scyliorhinus_canicula   LOC119967178    10  11  1
Scyliorhinus_canicula   LOC119967180    11  12  0
Petromyzon_marinus  LOC116953416    1   2   0
Petromyzon_marinus  LOC116953419    2   3   0
Petromyzon_marinus  CEP162  3   4   1
Petromyzon_marinus  FBXL22  4   5   0
Petromyzon_marinus  RNGTT   5   6   1
Petromyzon_marinus  CNR1    6   7   1
Petromyzon_marinus  AKIRIN2 7   8   1
Petromyzon_marinus  ORC3    8   9   0
Petromyzon_marinus  RARS2   9   10  1
Petromyzon_marinus  SLC35A1 10  11  0
Petromyzon_marinus  RHBDL2  11  12  1

The gene numbers are arranged in a serial manner because I want the genes to be adjacent to each other without any space

Using this data, I've generated a gene plot with arrows representing each gene's position and orientation. However, I'd like to add vertical lines connecting genes that have the same symbol value across different species. How can I achieve this? Is there any other package to help me with that?

2. Improving Color Palette for Publication Quality: I've used the scale_fill_manual function to define a color palette for the gene symbols in the plot. However, the current color palette doesn't meet the standards for publication quality. Could you please provide guidance on how to create a more visually appealing and suitable color palette for publication-ready plots?

Here's the current code snippet I'm using to generate the gene plot:

data$species = factor(data$species, levels = (unique(data$species)), ordered = TRUE) 

colourCount = length(unique(data$symbol))

getPalette = colorRampPalette(brewer.pal(12, "Set3"))

ggplot(data, aes(xmin = start, xmax = stop, y = species, fill = symbol, label = symbol, forward = orientation)) + geom_gene_arrow(arrowhead_height = unit(4, "mm"), arrowhead_width = unit(2, "mm")) + geom_gene_label(align = "centre") + facet_wrap(~ species, scales = "free", ncol = 1) + scale_fill_manual(values = getPalette(colourCount)) + scale_x_continuous(expand = expansion()) + theme_genes() + theme(legend.position = "none") + labs(x = "", y = "") + theme(legend.position = "none", axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.line.x = element_blank())

This is the plot that I'm trying to achieve: The plot comes up properly but I'm having an issue showing the connection between species. In the current plot, I've added the connections manually using Inkscape. Is there a way to do it through the script? enter image description here

I would greatly appreciate it if someone could provide a step-by-step solution for adding vertical lines connecting genes with the same symbol and suggest an improved color palette that would be suitable for publication-quality plots. Thank you in advance!

0

There are 0 best solutions below