I have a data frame that looks like the following:
set.seed(1)
mydf <- data.frame()
for (g in LETTERS[1:4]){
m <- data.frame(Group=g,
Gene=paste(sample(letters[1:4],25,replace=TRUE), sample(1:25,25,replace=FALSE), sep=''),
FoldChange=runif(25, -2, 2))
mydf <- rbind(mydf, m)
}
mydf$UpDown <- "DOWN"
mydf$UpDown[which(mydf$FoldChange>0)] <- "UP"
head(mydf)
Group Gene FoldChange UpDown
1 A b10 -0.08952151 DOWN
2 A b1 1.44483791 UP
3 A c9 -0.24761157 DOWN
4 A d20 -1.02081089 DOWN
5 A a8 -1.71728381 DOWN
6 A d25 -1.60213536 DOWN
I wanted to show the intersection of Genes
across Groups
, and so I made a Venn diagram:
mylist <- split(as.character(mydf$Gene), list(mydf$Group))
venn.diagram(mylist, filename="test.png", height=1000, width=1000, imagetype="png", units="px")
However, I would really like to show somehow the FoldChange
(or at least the UpDown
) values. I thought of doing something like this, splitting the overlapping numbers into UP
and DOWN
Genes
:
but there are still cases of a given Gene
that can be UP
in one Group
and DOWN
in other, so the above Venn diagram would be quite inaccurate...
subset(mydf, Gene=='b16')
Group Gene FoldChange UpDown
16 A b16 -0.9679329 DOWN
34 B b16 0.5711820 UP
90 D b16 -1.1147763 DOWN
I am thinking that the best way of showing this would be a Circos plot instead.
It should have one section per Group
, linking the shared Genes
between groups, and including the FoldChange
(or UpDown
) information.
I can think of two ways the information can be included:
1- Linking lines between A and B (for example) would be colored red if the Gene
is DOWN
in both Groups
, and blue if it is UP
in both Groups
. They would be colored red turning to blue if the Gene
is DOWN
in A and UP
in B, and blue turning to red if the opposite happens (does that make sense?)
2- Include an extra band of information to the Circos plot with the FoldChange
values (red for negative bars, and blue for positive ones). It would be nice that the chunk of Genes
that overlap are all together (instead of thin hairs here and there, and ordered according to FoldChange
values). Something similar to this probably:
However, I really have no idea how to even start, I tried making simple Circos plots in the past using the circlize
package, and totally failed at it.
I think the concept of what I want to accomplish is fairly simple... Does anyone have a clue of how to show it clearly on a Circos plot (or for that matter, any other representation you could suggest)?
Many thanks!
Although this an old question I will try to answer it since it is still an unsanswered. Please note that I used your question to learn how to use
circlize
, so apologies in advance if it is not optimal or if there is any error, I will appreciate any feedback.I must say that I do not solve in the way you proposed (connecting groups) because it would require massaging the input data to transform it into an adjacency matrix connecting groups.
It is possible, however, to directly represent most of the information you want to observe with the input data you provide, which may be helpful.
First I create a vector with friendly colors, those coming by default are sometimes too dark:
There are problems exporting transparency in other formats, so I will create a pdf. I create first a chord diagram with no annotations, because there are too many labels for the genes, I will add them later. I store the diagram in a variable called
cdm_res
that will help me later to retrieve the positions of the sectors, etc.We have the groups and genes connected. Now I add the labels of the groups.
Next, I create a rectangle for each observation with the height proportional to the absolute value of FoldChange, being red if it is positive and green if it is negative. In this way you can see when there are changes for the same gene. This could be possibly be done creating another track as well.
Finally, I add the labels at the end of the highest rectangle.
And here the result: