I need a facetted boxplot. The x-axis for the plots is a quantitative variable, and I want to reflect this information on the plot. The scale of the abscissa is very different among the facets.
My problem is that the widths of the boxes are very small for the facet with the large scale.
A possible explanation is that the width of the boxes is the same for all facets, whereas it should ideally be determined by the xlims of each facet individually.
I would be grateful for two inputs:
- Do you think this is a bug and should be reported ?
- Do you have a solution ?
Thanks in advance !
Remark: transforming the abscissa to a categorical variable could be one solution, but it is not perfect as it would result in a loss of some information.
Minimal working example:
library(tidyverse)
c(1:4,7) %>%
c(.,10*.) %>% # Create abscissa on two different scales
lapply(FUN = function(x) {tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B'))}) %>% # Create sample (y) and label (idx)
bind_rows() %>%
ggplot(aes(x = x, y = y, group = x)) +
geom_boxplot() +
facet_wrap(~idx, scales = 'free')
Result:
A cumbersome solution would be to redraw the boxplot from scratch, but this is not very satisfying:
draw_boxplot = function(locations, width, ymin, lower, middle, upper, ymax, idx){
local_df = tibble(locations = locations, width = width, ymin = ymin, lower = lower, middle = middle, upper = upper, ymax = ymax, idx = idx)
ggplot(data = local_df) +
geom_rect(aes(xmin = locations - width/2, xmax = locations + width/2, ymin = lower, ymax = upper), fill = 'white', colour = 'black') +
geom_segment(aes(x = locations - width/2, xend = locations + width/2, y = middle, yend = middle), size = 0.8) +
geom_segment(aes(x = locations, xend = locations, y = upper, yend = ymax)) +
geom_segment(aes(x = locations, xend = locations, y = lower, yend = ymin)) +
facet_wrap(~idx, scales = 'free_x')
}
make_boxplot = function(to_plot){
to_plot %>%
cmp_boxplot %>%
(function(x){
draw_boxplot(locations = x$x, width = x$width, ymin = x$y0, lower = x$y25, middle = x$y50, upper = x$y75, ymax = x$y100, idx = x$idx)
})
}
cmp_boxplot = function(to_plot){
to_plot %>%
group_by(idx) %>%
mutate(width = 0.6*(max(x) - min(x))/length(unique(x))) %>% #hand specified width
group_by(x) %>%
mutate(y0 = min(y),
y25 = quantile(y, 0.25),
y50 = median(y),
y75 = quantile(y, 0.75),
y100 = max(y)) %>%
select(-y) %>%
unique()
}
c(1:4,7) %>%
c(.,10*.) %>%
lapply(FUN = function(x) {tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B'))}) %>%
bind_rows() %>%
make_boxplot
Result:
Since
geom_boxplot
doesn't allow varyingwidth
as an aesthetic, you have to write your own. Fortunately it's not too complicated.This function does all the computation and returns dataframes suitable for use in
stat_summary
. Then we call it in several different layers to construct the various bits of a boxplot. Note that you need to compute the width of the boxplot per group of the facet, done below usingdplyr
in your pipe. I calculated the width such that the range of x gets split up into equal segments based on the number of unique x values, then each box gets about 1/2 the width of that segment. Your data may need a different adjustment.As for reporting this as a bug (actually a desired feature), I think it's an infrequent enough use case that they won't prioritize it. If you wrap this code up into a custom
geom
(based on here) and submit a pull-request, you might get more luck.