ggplot2: incorrect boxplot width when facetting with facets of different scales

1.3k Views Asked by At

I need a facetted boxplot. The x-axis for the plots is a quantitative variable, and I want to reflect this information on the plot. The scale of the abscissa is very different among the facets.

My problem is that the widths of the boxes are very small for the facet with the large scale.

A possible explanation is that the width of the boxes is the same for all facets, whereas it should ideally be determined by the xlims of each facet individually.

I would be grateful for two inputs:

  • Do you think this is a bug and should be reported ?
  • Do you have a solution ?

Thanks in advance !

Remark: transforming the abscissa to a categorical variable could be one solution, but it is not perfect as it would result in a loss of some information.

Minimal working example:

library(tidyverse)

c(1:4,7) %>% 
  c(.,10*.) %>% # Create abscissa on two different scales
  lapply(FUN = function(x) {tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B'))}) %>% # Create sample (y) and label (idx)
  bind_rows() %>% 
  ggplot(aes(x = x, y = y, group = x)) + 
  geom_boxplot() + 
  facet_wrap(~idx, scales = 'free') 

Result:

Result

A cumbersome solution would be to redraw the boxplot from scratch, but this is not very satisfying:

draw_boxplot = function(locations, width, ymin, lower, middle, upper, ymax, idx){

  local_df = tibble(locations = locations, width = width, ymin = ymin, lower = lower, middle = middle, upper = upper, ymax = ymax, idx = idx)

  ggplot(data = local_df) + 
    geom_rect(aes(xmin = locations - width/2, xmax = locations + width/2, ymin = lower, ymax = upper), fill = 'white', colour = 'black') + 
    geom_segment(aes(x = locations - width/2, xend = locations + width/2, y = middle, yend = middle), size = 0.8) + 
    geom_segment(aes(x = locations, xend = locations, y = upper, yend = ymax)) + 
    geom_segment(aes(x = locations, xend = locations, y = lower, yend = ymin)) + 
    facet_wrap(~idx, scales = 'free_x')
}

make_boxplot = function(to_plot){
  to_plot %>% 
    cmp_boxplot %>% 
    (function(x){
      draw_boxplot(locations = x$x, width = x$width, ymin = x$y0, lower = x$y25, middle = x$y50, upper = x$y75, ymax = x$y100, idx = x$idx)
    })

}


cmp_boxplot = function(to_plot){
  to_plot %>% 
    group_by(idx) %>% 
    mutate(width = 0.6*(max(x) - min(x))/length(unique(x))) %>% #hand specified width
    group_by(x) %>% 
    mutate(y0 = min(y),
           y25 = quantile(y, 0.25),
           y50 = median(y),
           y75 = quantile(y, 0.75),
           y100 = max(y)) %>% 
    select(-y) %>% 
    unique()
}

c(1:4,7) %>% 
  c(.,10*.) %>% 
  lapply(FUN = function(x) {tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B'))}) %>% 
  bind_rows() %>% 
  make_boxplot

Result:

Result

1

There are 1 best solutions below

1
On BEST ANSWER

Since geom_boxplot doesn't allow varying width as an aesthetic, you have to write your own. Fortunately it's not too complicated.

bp_custom <- function(vals, type) {

  bp = boxplot.stats(vals)

  if(type == "whiskers") {
    y    = bp$stats[1]
    yend = bp$stats[5]
    return(data.frame(y = y, yend = yend))
  }

  if(type == "box") {
    ymin = bp$stats[2]
    ymax = bp$stats[4]
    return(data.frame(ymin = ymin, ymax = ymax))
  }

  if(type == "median") {
    y    = median(vals)
    yend = median(vals)
    return(data.frame(y = y, yend = yend))
  }

  if(type == "outliers") {
    y = bp$out
    return(data.frame(y = y))
  } else {
    return(warning("Type must be one of 'whiskers', 'box', 'median', or 'outliers'."))
  }
}

This function does all the computation and returns dataframes suitable for use in stat_summary. Then we call it in several different layers to construct the various bits of a boxplot. Note that you need to compute the width of the boxplot per group of the facet, done below using dplyr in your pipe. I calculated the width such that the range of x gets split up into equal segments based on the number of unique x values, then each box gets about 1/2 the width of that segment. Your data may need a different adjustment.

library(dplyr)

c(1:4,7) %>% 
  c(.,10*.) %>% # Create abscissa on two different scales
  lapply(FUN = function(x) {
    tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B'))
  }) %>% 
  bind_rows() %>%                                
  group_by(idx) %>%                                              # NOTE THIS LINE
  mutate(width = 0.25*diff(range(x))/length(unique(x))) %>%      # NOTE THIS LINE
  ggplot(aes(x = x, y = y, group = x)) +
  stat_summary(fun.data = bp_custom, fun.args = "whiskers",
               geom = "segment", aes(xend = x)) + 
  stat_summary(fun.data = bp_custom, fun.args = "box", 
               geom = "rect", aes(xmin = x - width, xmax = x + width), 
               fill = "white", color = "black") + 
  stat_summary(fun.data = bp_custom, fun.args = "median", 
               geom = "segment", aes(x = x - width, xend = x + width), size = 1.5) + 
  stat_summary(fun.data = bp_custom, fun.args = "outliers", 
               geom = "point") + 
  facet_wrap(~idx, scales = 'free') 

enter image description here

As for reporting this as a bug (actually a desired feature), I think it's an infrequent enough use case that they won't prioritize it. If you wrap this code up into a custom geom (based on here) and submit a pull-request, you might get more luck.