How to separate graph into subgraphs with ggplot while retaining color and ordering formatting?

384 Views Asked by At

Brief overview of the figure:

  1. There is a bin for each gene that has been inserted into in the genomes of at least one organism of a population.
  2. There are four lineages within the population, with the distribution of insertions (and their frequency within that lineage) being what the graph is supposed to convey.
  3. The height of each bin represents the decimal fraction of organisms that have an insertion in the gene which the bin corresponds to.
  4. Each gene belongs to a functional category, which is indicated by the bin color and legend.

My problem: I cannot figure out how to separate the collection of bins into 4 subgraphs--one corresponding to each lineage--while maintaining the coloring and organization (all the colors being together and in the same order within each lineage subgraph).

Here is an illustrative subset of my data:

Lineage Genes   Category    V3
EAS EAS1    cell wall and cell processes    0.1071428571
EAS EAS2    conserved hypotheticals 0.1071428571
EAS EAS3    PE/PPE  0.0357142857
EAS EAS4    lipid metabolism    0.0357142857
EAS EAS5    lipid metabolism    0.0357142857
EAS EAS6    conserved hypotheticals 0.0357142857
EAS EAS7    conserved hypotheticals 0.0357142857
EAS EAS8    conserved hypotheticals 0.0357142857
EAI EAI3    PE/PPE  .111
EAI EAI4    PE/PPE  .111
EAI EAI5    conserved hypotheticals .111
EAI EAI6    intermediary metabolism and respiration .111
EAI EAI7    cell wall and cell processes    .222
EAI EAI8    intermediary metabolism and respiration .111
EAI EAI9    conserved hypotheticals .111
IO  IO1 information pathways    0.1666666667
IO  IO2 virulence, detoxification, adaptation   0.1666666667
IO  IO3 conserved hypotheticals 0.1666666667
IO  IO4 cell wall and cell processes    0.3333333333
IO  IO5 PE/PPE  0.3333333333
IO  IO6 intermediary metabolism and respiration 0.1666666667
IO  IO7 PE/PPE  0.1666666667
IO  IO8 PE/PPE  0.1666666667
IO  IO9 insertion seqs and phages   0.3333333333
EAM EAM1    insertion seqs and phages   0.2727272727
EAM EAM2    cell wall and cell processes    0.0454545455
EAM EAM3    lipid metabolism    0.0454545455
EAM EAM4    conserved hypotheticals 0.0454545455

And the code I've run to generate the current graph:

# Loading in my data
PanppData <- read.table("Allpp", header = TRUE, sep = "\t")

# Making ordering depend on category
PanppData$Genes <- factor(PanppData$Genes, levels=PanppData$Genes[order(PanppData$Category)]) # Making ordering depend on Category

# Generating the Graph
ggplot(PanppData, aes(x=Genes, V3)) +    
geom_bar(aes(fill=Category),
         stat="identity",
         colour="white",
         position= "dodge") +
ggtitle("EAS            EAI               IO            EAM") +
labs(x="Genes",y="Decimal Fraction of Isolates") 

I would like the graph to maintain its order and coloring, but be separated into four subgraphs--one for each Lineage--each containing bins for all genes within that lineage.

I've tried using melt with no success.

I've only been using R for a few weeks, and have been working for double digit hours on this graph preparing for an upcoming conference. Any help or direction is greatly appreciated!!

Current Graph

I hope that my question is clear. This is my first time posting a question to Stackoverflow, so please let me know if I can do anything to make my question more clear, useful, etc.

0

There are 0 best solutions below