Geom tile white space issue when the x variable is spread unevenly accross facet grids

894 Views Asked by At

I'm trying to produce a heat map of gene expression from samples of different conditions, faceted by the conditions:

require(reshape2)
set.seed(1)
expression.mat <- matrix(rnorm(100*1000),nrow=100)
df <- reshape2::melt(expression.mat)
colnames(df) <- c("gene","sample","expression")
df$condition <- factor(c(rep("C1",2500),rep("C2",3500),rep("C3",3800),rep("C4",200)),levels=c("C1","C2","C3","C4"))

I'd like to color by expression range:

df$range <- cut(df$expression,breaks=6)

The width parameter in ggplot's aes is supposed to control the width of the different facets. My question is how to find the optimal width value such that the figure is not distorted?

I played around a bit with this plot command:

require(ggplot2)
ggplot(df,aes(x=sample,y=gene,fill=range,width=100))+facet_grid(~condition,scales="free")+geom_tile(color=NA)+labs(x="condition",y="gene")+theme_bw()

Setting width to be below 100 leaves gaps in the last facet (with the lowest number of samples), and already at this value of 100 you can see that the right column in the first facet from left is distorted (wider than the columns to its left):

enter image description here

So my question is how to fix this/find a width that doesn't cause this.

1

There are 1 best solutions below

5
On

Edit showing the issue with the sample variable faceted by condition

There is no C1 sample between 25 and 100, because they are by C2, c3 and C4. Here is an illustration for the sample < 200.

ggplot(filter(df[df$sample < 200,]),
       aes(x=sample, y = gene, fill=range)) +
    geom_tile() +
    facet_grid(~condition)

plot showing issue for sample below 200

The number of sample is not the same in all facets and faceting on conditoins creates wholes between sample numbers for each condition.

One way to go around this problem would be to create a sample2 number. I work using the dplyr package.

library(dplyr)
sample2 <- df %>% 
    group_by(condition) %>% 
    distinct(sample) %>% 
    mutate(sample2 = 1:n())

df <- df %>% 
    left_join(sample2, by = c("condition", "sample"))

Then plot using sample2 as the x variable

ggplot(df,aes(x = sample2, y = gene,
              fill = range))+
    facet_grid(~condition) + 
    geom_tile(color=NA) + theme_bw()

sample2 plot updated

Using the scales argument to vary scales on the x axis.

ggplot(df,aes(x = sample2, y = gene,
              fill = range))+
    facet_grid(~condition, scales = "free") + 
    geom_tile() + theme_bw()

Old answer using width

See for example this answer.

Adding a width aesthetic produces wider columns:

ggplot(df,aes(x = sample, y = gene,
              fill = range, width = 50))+
    facet_grid(~condition) + 
    geom_tile(color=NA) + 
    labs(x="condition",y="gene")+theme_bw()