R: Group intersections in circos plots showing extra band with variable values

666 Views Asked by At

I have a data frame that looks like the following:

set.seed(1)
mydf <- data.frame()
for (g in LETTERS[1:4]){
    m <- data.frame(Group=g,
                    Gene=paste(sample(letters[1:4],25,replace=TRUE), sample(1:25,25,replace=FALSE), sep=''),
                    FoldChange=runif(25, -2, 2))
    mydf <- rbind(mydf, m)
}
mydf$UpDown <- "DOWN"
mydf$UpDown[which(mydf$FoldChange>0)] <- "UP"
head(mydf)

  Group Gene  FoldChange UpDown
1     A  b10 -0.08952151   DOWN
2     A   b1  1.44483791     UP
3     A   c9 -0.24761157   DOWN
4     A  d20 -1.02081089   DOWN
5     A   a8 -1.71728381   DOWN
6     A  d25 -1.60213536   DOWN

I wanted to show the intersection of Genes across Groups, and so I made a Venn diagram:

mylist <- split(as.character(mydf$Gene), list(mydf$Group))
venn.diagram(mylist, filename="test.png", height=1000, width=1000, imagetype="png", units="px")

test1

However, I would really like to show somehow the FoldChange (or at least the UpDown) values. I thought of doing something like this, splitting the overlapping numbers into UP and DOWN Genes:

test2

but there are still cases of a given Gene that can be UP in one Group and DOWN in other, so the above Venn diagram would be quite inaccurate...

subset(mydf, Gene=='b16')
   Group Gene FoldChange UpDown
16     A  b16 -0.9679329   DOWN
34     B  b16  0.5711820     UP
90     D  b16 -1.1147763   DOWN

I am thinking that the best way of showing this would be a Circos plot instead.

It should have one section per Group, linking the shared Genes between groups, and including the FoldChange (or UpDown) information.

I can think of two ways the information can be included:

1- Linking lines between A and B (for example) would be colored red if the Gene is DOWN in both Groups, and blue if it is UP in both Groups. They would be colored red turning to blue if the Gene is DOWN in A and UP in B, and blue turning to red if the opposite happens (does that make sense?)

2- Include an extra band of information to the Circos plot with the FoldChange values (red for negative bars, and blue for positive ones). It would be nice that the chunk of Genes that overlap are all together (instead of thin hairs here and there, and ordered according to FoldChange values). Something similar to this probably: test3

However, I really have no idea how to even start, I tried making simple Circos plots in the past using the circlize package, and totally failed at it.

I think the concept of what I want to accomplish is fairly simple... Does anyone have a clue of how to show it clearly on a Circos plot (or for that matter, any other representation you could suggest)?

Many thanks!

1

There are 1 best solutions below

0
On

Although this an old question I will try to answer it since it is still an unsanswered. Please note that I used your question to learn how to use circlize, so apologies in advance if it is not optimal or if there is any error, I will appreciate any feedback.

I must say that I do not solve in the way you proposed (connecting groups) because it would require massaging the input data to transform it into an adjacency matrix connecting groups.

It is possible, however, to directly represent most of the information you want to observe with the input data you provide, which may be helpful.

First I create a vector with friendly colors, those coming by default are sometimes too dark:

library(RColorBrewer)
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
Nsectors = length(unique(mydf$Group))+length(unique(mydf$Gene)) 
col_vector = col_vector[1:Nsectors]

There are problems exporting transparency in other formats, so I will create a pdf. I create first a chord diagram with no annotations, because there are too many labels for the genes, I will add them later. I store the diagram in a variable called cdm_res that will help me later to retrieve the positions of the sectors, etc.

pdf(file = "circlize_example.pdf",width = 10, height = 10)
# -- Create a chord diagram with no labels nor axis
cdm_res = chordDiagram(x = mydf[,c(2,1,3)],  # I change the order, largest groups on top
              annotationTrack = "grid", # I create it without sector names, it is too crowded
              grid.col = col_vector)

We have the groups and genes connected. Now I add the labels of the groups.

# -- Add now labels, etc.
for(si in unique(mydf$Group)) { # Now I add the labels for Group and the axis
  xlim = get.cell.meta.data("xlim", sector.index = si, track.index = 1)
  ylim = get.cell.meta.data("ylim", sector.index = si, track.index = 1)
  circos.text(mean(xlim), mean(ylim), si, sector.index = si, track.index = 1, 
              facing = "outside", cex=1.2, niceFacing = TRUE)
  circos.axis(h = "top", labels.cex = 0.5, sector.index = si,track.index=1)
}

Next, I create a rectangle for each observation with the height proportional to the absolute value of FoldChange, being red if it is positive and green if it is negative. In this way you can see when there are changes for the same gene. This could be possibly be done creating another track as well.

y1 = ylim[1] + 1.2* (ylim[2] - ylim[1]) # The rectangle will start at the top of the coloured sectors

  for(i in seq_len(nrow(cdm_res))) {
    circos.rect(cdm_res[i, "x1"], y1, cdm_res[i, "x1"] - 
                  abs(cdm_res[i, "value1"]), y1 + abs(cdm_res[i, "value1"]),#(y2-y1)*0.45, 
                col = ifelse(cdm_res[i,"value1"]>0,"red","green"), 
                border = ifelse(cdm_res[i,"value1"]>0,"red","green"),
                sector.index = cdm_res$rn[i], track.index = 1)
  } 

Finally, I add the labels at the end of the highest rectangle.

for(si in unique(mydf$Gene)) { # For Gene I skip axis and reduce label's size
  xlim = get.cell.meta.data("xlim", sector.index = si, track.index = 1)
  #ylim = get.cell.meta.data("ylim", sector.index = si, track.index = 1)
  idx=which(cdm_res$rn == si) # which are the indxs for this sector?
  height=max(abs(cdm_res$value1[idx])) # we look for the maximum value
  y2_label=4+height # to locate our label at the top of the rect, a little bit further
  circos.text(mean(xlim), mean(ylim), si, sector.index = si, track.index = 1, 
              facing = "outside", 
              cex=0.5, # here the challenge, if the labels are large they will overlap 
              adj=c(degree(0),degree(y2_label)), # increase the offset to make labels larger
              niceFacing = TRUE)
  
}

circos.clear()
dev.off()

And here the result: chord diagram with circlize