Brief overview of the figure:
- There is a bin for each gene that has been inserted into in the genomes of at least one organism of a population.
- There are four lineages within the population, with the distribution of insertions (and their frequency within that lineage) being what the graph is supposed to convey.
- The height of each bin represents the decimal fraction of organisms that have an insertion in the gene which the bin corresponds to.
- Each gene belongs to a functional category, which is indicated by the bin color and legend.
My problem: I cannot figure out how to separate the collection of bins into 4 subgraphs--one corresponding to each lineage--while maintaining the coloring and organization (all the colors being together and in the same order within each lineage subgraph).
Here is an illustrative subset of my data:
Lineage Genes Category V3
EAS EAS1 cell wall and cell processes 0.1071428571
EAS EAS2 conserved hypotheticals 0.1071428571
EAS EAS3 PE/PPE 0.0357142857
EAS EAS4 lipid metabolism 0.0357142857
EAS EAS5 lipid metabolism 0.0357142857
EAS EAS6 conserved hypotheticals 0.0357142857
EAS EAS7 conserved hypotheticals 0.0357142857
EAS EAS8 conserved hypotheticals 0.0357142857
EAI EAI3 PE/PPE .111
EAI EAI4 PE/PPE .111
EAI EAI5 conserved hypotheticals .111
EAI EAI6 intermediary metabolism and respiration .111
EAI EAI7 cell wall and cell processes .222
EAI EAI8 intermediary metabolism and respiration .111
EAI EAI9 conserved hypotheticals .111
IO IO1 information pathways 0.1666666667
IO IO2 virulence, detoxification, adaptation 0.1666666667
IO IO3 conserved hypotheticals 0.1666666667
IO IO4 cell wall and cell processes 0.3333333333
IO IO5 PE/PPE 0.3333333333
IO IO6 intermediary metabolism and respiration 0.1666666667
IO IO7 PE/PPE 0.1666666667
IO IO8 PE/PPE 0.1666666667
IO IO9 insertion seqs and phages 0.3333333333
EAM EAM1 insertion seqs and phages 0.2727272727
EAM EAM2 cell wall and cell processes 0.0454545455
EAM EAM3 lipid metabolism 0.0454545455
EAM EAM4 conserved hypotheticals 0.0454545455
And the code I've run to generate the current graph:
# Loading in my data
PanppData <- read.table("Allpp", header = TRUE, sep = "\t")
# Making ordering depend on category
PanppData$Genes <- factor(PanppData$Genes, levels=PanppData$Genes[order(PanppData$Category)]) # Making ordering depend on Category
# Generating the Graph
ggplot(PanppData, aes(x=Genes, V3)) +
geom_bar(aes(fill=Category),
stat="identity",
colour="white",
position= "dodge") +
ggtitle("EAS EAI IO EAM") +
labs(x="Genes",y="Decimal Fraction of Isolates")
I would like the graph to maintain its order and coloring, but be separated into four subgraphs--one for each Lineage--each containing bins for all genes within that lineage.
I've tried using melt with no success.
I've only been using R for a few weeks, and have been working for double digit hours on this graph preparing for an upcoming conference. Any help or direction is greatly appreciated!!
I hope that my question is clear. This is my first time posting a question to Stackoverflow, so please let me know if I can do anything to make my question more clear, useful, etc.