How to place stat_interval on facets using facet specific y-value? (R, ggplot)

53 Views Asked by At

I have multiple histograms faceted, each with a unique y axis. I would like to overlay a dot and line representing the mean and 90%CI near the top of facet using stat_interval. Stat_interval allows you to specify a Y location for each line, but because each facet has a unique Y axis, this means the stat_interval geom is not in the same region of the plot for each facet. Here is some code to show what I am dealing with:

dfmeans <- as.integer(c(4,5,6))
N <- c(100, 150, 300)

data <- data.frame(dfmeans, N)
data <- as.data.frame(lapply(data, rep, data$N))

data <- data %>%
  mutate(values = rnorm(n=nrow(data), mean = dfmeans, sd = 0.5))

ggplot() +
  geom_histogram(data=data, aes(values), bins = 100) +
  stat_pointinterval(data=data, aes(x = values, y = 5, group = dfmeans)) +
  facet_wrap(. ~ dfmeans, scales = "free_y") +
  scale_y_continuous(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0, 0))

Would it be possible to replace y=5 with an equation that represents a proportion of the y axis? Or is it better to try and overlay plots somehow?

1

There are 1 best solutions below

2
stefan On

Update IMHO, moving the point intervals inside the panels requires some effort and some hacky approach. In the code below I proceed in three steps. First, draw the histograms. Second, get the max value of the data range per panel for which I use layer_scales. Finally, make a dataset containing the values for the point intervals using median_qi and add the max values for the y position. Then plot you point intervals using geom_pointinterval. Here I have set the y position according to the max value of the data range plus some expansion or padding. Additionally we have to set the height for the point interval in line with the expansion to align the point intervals over the panels:

library(ggplot2)
library(ggdist)
library(dplyr, warn = FALSE)

set.seed(1)

p <- ggplot() +
  geom_histogram(data = data, aes(values), bins = 100) +
  facet_wrap(. ~ dfmeans, scales = "free_y") +
  scale_y_continuous(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0, 0))

# Get the max of the data range per panel
y_max <- data.frame(
  dfmeans = sort(unique(data$dfmeans)),
  y = sapply(1:3, \(j) {
    scale <- layer_scales(p, j = j)
    range <- scale$y$range$range[2]
  })
)

dat_interval <- data %>%
  group_by(dfmeans) |>
  median_qi(.width = c(.8, .95)) |>
  left_join(y_max, by = "dfmeans")

p +
  geom_pointinterval(
    data = dat_interval,
    aes(
      x = values,
      y = y * 1.05,
      xmin = values.lower, xmax = values.upper,
      ymax = y,
      height = .05 * y
    ),
    orientation = "y"
  )

Original Answer One easy option would be to use y = Inf which will put the point interval at the top for each facet panel where in the code below I also added some padding by expanding the y scale.

library(ggplot2)
library(ggdist)

set.seed(1)

p <- ggplot() +
  geom_histogram(data = data, aes(values), bins = 100) +
  stat_pointinterval(
    data = data,
    aes(
      x = values, group = dfmeans,
      y = Inf
    )
  ) +
  facet_wrap(. ~ dfmeans, scales = "free_y") +
  scale_y_continuous(expand = c(0, 0, 0, 1)) +
  scale_x_continuous(expand = c(0, 0))

p

If you don't want the point interval to be clipped off you could set clip="off" which however also requires to get rid of the strip background:

p +
  theme(
    strip.background.x = element_rect(fill = NA)
  ) +
  coord_cartesian(clip = "off")