Can I use different cutoff points for different groups with stat_density_ridges?

79 Views Asked by At

I have a dataframe with different groups ('label' column). For each label, I want to plot a null distribution obtained from bootstrapping (values are in the 'null' column) and the true statistic on top (value in the 'sc' column). Ideally, I would like the area after the statistic to have a different color, to mark that this is my p-value. Is this possible to do with stat_density_ridges?

Here is an example R code:

library(ggplot2)
library(tidyverse)
library(ggridges)

df <- data.frame()

for (label in LETTERS) {
  mean=rnorm(1,0.5,0.2)
  null = rnorm(1000,mean,0.1);
  sc = rnorm(1,0.5,0.2)
  df <- rbind(df, data.frame(label=label, null=null, sc=sc))
}

df <- df %>% 
  mutate(label=as.factor(label))

ggplot(df, aes(x = null, y = label))  +
  stat_density_ridges(scale=1.2,alpha = 1, size=1)+
  scale_x_continuous(limits=c(0,1),breaks=seq(0,1,0.2)) +
  geom_segment(aes(x=sc, xend=sc, y=as.numeric(label)-0.1, yend=as.numeric(label)+0.5), size=1) +
  coord_flip()

The resulting figure is this:

ridge plot

But ideally, I would like each ridge to be more like this:

enter image description here

With the color changes after the sc value. Is that possible? Thanks :)

1

There are 1 best solutions below

11
Quinten On BEST ANSWER

You could use the fill with ..x.. to create different colors at a fixed x value of your plot. So the shaded area will be the same across all plots. You could modify this by using ggplot_build with a separate dataframe that has the p_values which are the thresholds. So with these thresholds you could conditionally change the color in the layer. Here is some reproducible code:

library(ggplot2)
library(tidyverse)
library(ggridges)

df <- data.frame()

set.seed(7) # for reproducibility
for (label in LETTERS) {
  mean=rnorm(1,0.5,0.2)
  null = rnorm(1000,mean,0.1);
  sc = rnorm(1,0.5,0.2)
  df <- rbind(df, data.frame(label=label, null=null, sc=sc))
}

df <- df %>% 
  mutate(label=as.factor(label))
# Create dataframe with p_values ranges per label
p_values = df %>% 
  group_by(label) %>% 
  summarise(p_value = unique(sc)) %>%
  mutate(label = as.integer(label)) # make sure label is the same as in ggplot_build

# plot
p <- ggplot(df, aes(x = null, y = label, fill = ifelse(..x.. < sc, "no sign", "sign"), group = factor(label)))  +
  stat_density_ridges(geom = "density_ridges_gradient",,
                      scale=1.2,alpha = 1, size=1,
                      calc_ecdf = TRUE) +
  scale_fill_manual(values = c("red", "blue"), name = "") +
  coord_flip()
p
#> Picking joint bandwidth of 0.0224

# Modify layer
q <- ggplot_build(p)
#> Picking joint bandwidth of 0.0224
q$data[[1]] = q$data[[1]] %>%
  left_join(., p_values,
            by = c("group" = "label")) %>%
  mutate(fill = case_when(x < p_value ~ fill,
                          TRUE ~ "blue")) %>%
  select(-p_value)
q <- ggplot_gtable(q)
plot(q)

Created on 2023-03-28 with reprex v2.0.2

As you can see in the latest plot, the shaded areas are now according to the sc value of your dataframe per group.