I am currently working on visualising a gene dataset using the ggplot2 and gggene package in R to showcase the conservation of gene neighbourhood or synteny across evolution. I have successfully created a gene plot with arrows representing genes, labels indicating their symbols and orientation depicts the direction of arrow. However, I am now facing two challenges that I would like some help with.
1. Adding Vertical Lines Connecting Genes with the Same Symbol: I have a dataset that contains information about genes, including their symbols, start and stop positions, and orientation. Here's a snippet of the dataset present as data.csv:
species symbol start stop orientation
Homo_sapiens SLC35A1 1 2 1
Homo_sapiens RARS2 2 3 0
Homo_sapiens ORC3 3 4 1
Homo_sapiens AKIRIN2 4 5 0
Homo_sapiens SPACA1 5 6 1
Homo_sapiens CNR1 6 7 0
Homo_sapiens RNGTT 7 8 0
Homo_sapiens PNRC1 8 9 1
Homo_sapiens PM20D2 9 10 1
Homo_sapiens SRSF12 10 11 0
Homo_sapiens GABRR1 11 12 0
Mus_musculus GABRR1 1 2 1
Mus_musculus PM20D2 2 3 0
Mus_musculus SRSF12 3 4 1
Mus_musculus PNRC1 4 5 0
Mus_musculus RNGTT 5 6 1
Mus_musculus CNR1 6 7 1
Mus_musculus SPACA1 7 8 0
Mus_musculus AKIRIN2 8 9 1
Mus_musculus ORC3 9 10 0
Mus_musculus RARS2 10 11 1
Mus_musculus SLC35A1 11 12 0
Rattus_norvegicus GABRR1 1 2 1
Rattus_norvegicus PM20D2 2 3 0
Rattus_norvegicus SRSF12 3 4 1
Rattus_norvegicus PNRC1 4 5 0
Rattus_norvegicus RNGTT 5 6 1
Rattus_norvegicus CNR1 6 7 1
Rattus_norvegicus SPACA1 7 8 0
Rattus_norvegicus AKIRIN2 8 9 1
Rattus_norvegicus ORC3 9 10 0
Rattus_norvegicus RARS2 10 11 1
Rattus_norvegicus SLC35A1 11 12 0
Canis_lupus_familiaris SLC35A1 1 2 1
Canis_lupus_familiaris RARS2 2 3 0
Canis_lupus_familiaris ORC3 3 4 1
Canis_lupus_familiaris AKIRIN2 4 5 0
Canis_lupus_familiaris SPACA1 5 6 1
Canis_lupus_familiaris CNR1 6 7 0
Canis_lupus_familiaris RNGTT 7 8 0
Canis_lupus_familiaris PNRC1 8 9 1
Canis_lupus_familiaris SRSF12 9 10 0
Canis_lupus_familiaris PM20D2 10 11 1
Canis_lupus_familiaris GABRR1 11 12 0
Monodelphis_domestica SLC35A1 1 2 1
Monodelphis_domestica RARS2 2 3 0
Monodelphis_domestica ORC3 3 4 1
Monodelphis_domestica AKIRIN2 4 5 0
Monodelphis_domestica SPACA1 5 6 1
Monodelphis_domestica CNR1 6 7 0
Monodelphis_domestica RNGTT 7 8 0
Monodelphis_domestica PNRC1 8 9 1
Monodelphis_domestica SRSF12 9 10 0
Monodelphis_domestica PM20D2 10 11 1
Monodelphis_domestica GABRR1 11 12 0
Ornithorhynchus_anatinus SLC35A1 1 2 1
Ornithorhynchus_anatinus RARS2 2 3 0
Ornithorhynchus_anatinus ORC3 3 4 1
Ornithorhynchus_anatinus AKIRIN2 4 5 0
Ornithorhynchus_anatinus SPACA1 5 6 1
Ornithorhynchus_anatinus CNR1 6 7 0
Ornithorhynchus_anatinus RNGTT 7 8 0
Ornithorhynchus_anatinus PNRC1 8 9 1
Ornithorhynchus_anatinus PM20D2 9 10 1
Ornithorhynchus_anatinus LOC100076186 10 11 0
Ornithorhynchus_anatinus LOC114805750 11 12 1
Gallus_gallus PM20D2 1 2 0
Gallus_gallus PNRC1 2 3 0
Gallus_gallus BORCS6 3 4 1
Gallus_gallus RNGTT 4 5 1
Gallus_gallus LOC101749895 5 6 1
Gallus_gallus CNR1 6 7 1
Gallus_gallus SPACA1 7 8 0
Gallus_gallus AKIRIN2 8 9 1
Gallus_gallus ORC3 9 10 0
Gallus_gallus RARS2 10 11 1
Gallus_gallus SLC35A1 11 12 0
Taeniopygia_guttata CFAP206 1 2 1
Taeniopygia_guttata SLC35A1 2 3 1
Taeniopygia_guttata RARS2 3 4 0
Taeniopygia_guttata ORC3 4 5 1
Taeniopygia_guttata AKIRIN2 5 6 0
Taeniopygia_guttata CNR1 6 7 0
Taeniopygia_guttata RNGTT 7 8 0
Taeniopygia_guttata BORCS6 8 9 0
Taeniopygia_guttata PNRC1 9 10 1
Taeniopygia_guttata PM20D2 10 11 1
Taeniopygia_guttata GABRR1 11 12 0
Chelonia_mydas SLC35A1 1 2 1
Chelonia_mydas RARS2 2 3 0
Chelonia_mydas ORC3 3 4 1
Chelonia_mydas AKIRIN2 4 5 0
Chelonia_mydas SPACA1 5 6 1
Chelonia_mydas CNR1 6 7 0
Chelonia_mydas RNGTT 7 8 0
Chelonia_mydas LOC102938330 8 9 0
Chelonia_mydas PNRC1 9 10 1
Chelonia_mydas PM20D2 10 11 1
Chelonia_mydas GABRR1 11 12 0
Anolis_carolinensis PM20D2 1 2 0
Anolis_carolinensis SRSF12 2 3 1
Anolis_carolinensis PNRC1 3 4 0
Anolis_carolinensis RNGTT 4 5 1
Anolis_carolinensis LOC107982676 5 6 0
Anolis_carolinensis CNR1 6 7 1
Anolis_carolinensis SPACA1 7 8 0
Anolis_carolinensis AKIRIN2 8 9 1
Anolis_carolinensis ORC3 9 10 0
Anolis_carolinensis RARS2 10 11 1
Anolis_carolinensis SLC35A1 11 12 0
Xenopus_laevis GABRR2.S 1 2 1
Xenopus_laevis GABRR1.S 2 3 1
Xenopus_laevis PM20D2.S 3 4 0
Xenopus_laevis LOC108717975 4 5 1
Xenopus_laevis RNGTT.S 5 6 1
Xenopus_laevis CNR1.S 6 7 1
Xenopus_laevis AKIRIN2.S 7 8 1
Xenopus_laevis ORC3.S 8 9 0
Xenopus_laevis RARS2.S 9 10 1
Xenopus_laevis SLC35A1.S 10 11 0
Xenopus_laevis LOC108717977 11 12 1
Latimeria_chalumnae DDX24 1 2 0
Latimeria_chalumnae PPP4R4 2 3 1
Latimeria_chalumnae SERPINA10B 3 4 0
Latimeria_chalumnae ARRDC3A 4 5 1
Latimeria_chalumnae LOC102360869 5 6 0
Latimeria_chalumnae CNR1 6 7 1
Latimeria_chalumnae SPACA1 7 8 0
Latimeria_chalumnae AKIRIN2 8 9 1
Latimeria_chalumnae ORC3 9 10 0
Latimeria_chalumnae RARS2 10 11 1
Latimeria_chalumnae LOC102362557 11 12 1
Protopterus_annectens LOC122794922 1 2 1
Protopterus_annectens LOC122794923 2 3 1
Protopterus_annectens LOC122794924 3 4 1
Protopterus_annectens FBXL5 4 5 1
Protopterus_annectens CC2D2A 5 6 0
Protopterus_annectens CNR1 6 7 1
Protopterus_annectens CPEB2 7 8 0
Protopterus_annectens BOD1L1 8 9 0
Protopterus_annectens C1QTNF7 9 10 0
Protopterus_annectens NKX3-2 10 11 1
Protopterus_annectens RAB28 11 12 1
Danio_rerio MYO6A 1 2 1
Danio_rerio LOC569340 2 3 0
Danio_rerio MEI4 3 4 1
Danio_rerio NT5E 4 5 1
Danio_rerio SNX14 5 6 0
Danio_rerio CNR1 6 7 0
Danio_rerio RNGTT 7 8 0
Danio_rerio PNRC1 8 9 1
Danio_rerio GABRR1 9 10 0
Danio_rerio GABRR2B 10 11 0
Danio_rerio UBE2J1 11 12 0
Oreochromis_niloticus SI:DKEY-174M14.3 1 2 1
Oreochromis_niloticus RDH14B 2 3 0
Oreochromis_niloticus LOC102078481 3 4 1
Oreochromis_niloticus RNGTT 4 5 1
Oreochromis_niloticus LOC112842425 5 6 0
Oreochromis_niloticus CNR1 6 7 1
Oreochromis_niloticus AKIRIN2 7 8 1
Oreochromis_niloticus RARS2 8 9 1
Oreochromis_niloticus SLC35A1 9 10 0
Oreochromis_niloticus LOC100692709 10 11 0
Oreochromis_niloticus LOC102081816 11 12 1
Scyliorhinus_canicula SLC35A1 1 2 1
Scyliorhinus_canicula RARS2 2 3 0
Scyliorhinus_canicula ORC3 3 4 1
Scyliorhinus_canicula AKIRIN2 4 5 0
Scyliorhinus_canicula LOC119967921 5 6 1
Scyliorhinus_canicula CNR1 6 7 0
Scyliorhinus_canicula RNGTT 7 8 0
Scyliorhinus_canicula LOC119967175 8 9 0
Scyliorhinus_canicula PNRC1 9 10 1
Scyliorhinus_canicula LOC119967178 10 11 1
Scyliorhinus_canicula LOC119967180 11 12 0
Petromyzon_marinus LOC116953416 1 2 0
Petromyzon_marinus LOC116953419 2 3 0
Petromyzon_marinus CEP162 3 4 1
Petromyzon_marinus FBXL22 4 5 0
Petromyzon_marinus RNGTT 5 6 1
Petromyzon_marinus CNR1 6 7 1
Petromyzon_marinus AKIRIN2 7 8 1
Petromyzon_marinus ORC3 8 9 0
Petromyzon_marinus RARS2 9 10 1
Petromyzon_marinus SLC35A1 10 11 0
Petromyzon_marinus RHBDL2 11 12 1
The gene numbers are arranged in a serial manner because I want the genes to be adjacent to each other without any space
Using this data, I've generated a gene plot with arrows representing each gene's position and orientation. However, I'd like to add vertical lines connecting genes that have the same symbol value across different species. How can I achieve this? Is there any other package to help me with that?
2. Improving Color Palette for Publication Quality: I've used the scale_fill_manual function to define a color palette for the gene symbols in the plot. However, the current color palette doesn't meet the standards for publication quality. Could you please provide guidance on how to create a more visually appealing and suitable color palette for publication-ready plots?
Here's the current code snippet I'm using to generate the gene plot:
data$species = factor(data$species, levels = (unique(data$species)), ordered = TRUE)
colourCount = length(unique(data$symbol))
getPalette = colorRampPalette(brewer.pal(12, "Set3"))
ggplot(data, aes(xmin = start, xmax = stop, y = species, fill = symbol, label = symbol, forward = orientation)) + geom_gene_arrow(arrowhead_height = unit(4, "mm"), arrowhead_width = unit(2, "mm")) + geom_gene_label(align = "centre") + facet_wrap(~ species, scales = "free", ncol = 1) + scale_fill_manual(values = getPalette(colourCount)) + scale_x_continuous(expand = expansion()) + theme_genes() + theme(legend.position = "none") + labs(x = "", y = "") + theme(legend.position = "none", axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.line.x = element_blank())
This is the plot that I'm trying to achieve:
The plot comes up properly but I'm having an issue showing the connection between species. In the current plot, I've added the connections manually using Inkscape. Is there a way to do it through the script?
I would greatly appreciate it if someone could provide a step-by-step solution for adding vertical lines connecting genes with the same symbol and suggest an improved color palette that would be suitable for publication-quality plots. Thank you in advance!