Plot only top layers of ggplot stat_density_2d/geom_density_2d in R

210 Views Asked by At

I'm trying to generate a contour plot with ggplot in R, but keeping only the top layers of the plot. So for instance, with the following toy data:

df <- data.frame(x=rnorm(2000, 10, 3), y=rnorm(2000, 10, 3))
stat_density_plot <- ggplot(df, aes(x, y)) +
                      geom_point() +
                      stat_density_2d(aes(fill = ..level..), geom = "polygon", bins=15) 
geom_density_plot <- ggplot(df, aes(x, y)) +
                      geom_point() +
                      geom_density_2d(bins = 15, color = "red") 

I would want to plot only the top 4 levels in stat_density_plot, and only the innermost 4 contours in the geom_density_plot.
I've been toying with the idea of generating the kernel density estimation myself (MASS::kde2d(df$x, df$y)) and manually removing all the rest of the layers, but I still don't know how to plot the result with ggplot.
Any insight about how to generate any of the two plots will be most welcome.

2

There are 2 best solutions below

0
On BEST ANSWER

You can user layer_data() to get the actual data used by ggplot to create the polygons / lines, and focus on the levels you want.

# original
geom_density_plot <- ggplot(df, aes(x, y)) +
  geom_point() +
  geom_density_2d(bins = 15, color = "red", linewidth = 2) # thicker for better visibility

# filtered for desired rings (group numbering goes from outermost to innermost
# so we reverse that before filtering for the first four groups, which now
# correspond to the innermost rings)
layer_data(geom_density_plot, 2L) %>% 
  mutate(group = forcats::fct_rev(group)) %>%
  filter(as.integer(group) <= 4) %>%
  ggplot(aes(x = x, y = y)) +
  geom_point(data = df) +
  geom_path(aes(group = group), color = "red", linewidth = 2)

result 1

# original
stat_density_plot <- ggplot(df, aes(x, y)) +
  geom_point() +
  stat_density_2d(aes(fill = after_stat(level)), # after_stat is used in more recent versions of ggplot; the `...level...` syntax is considered old now
                  geom = "polygon", bins=15) 

# set transparency of unwanted levels to zero (we don't filter out the unwanted
# levels here, as the full range of levels is required to match the colour palette
# of the original)
layer_data(stat_density_plot, 2L) %>% 
  mutate(group1 = forcats::fct_rev(group)) %>%
  ggplot(aes(x = x, y = y)) +
  geom_point(data = df) +
  geom_polygon(aes(group = group, fill = level, 
                   alpha = ifelse(as.integer(group1) <= 4, 1, 0))) +
  scale_alpha_identity()

result 2

0
On

Disclaimer: This answer is based on the answers in this question How to plot a contour line showing where 95% of values fall within, in R and in ggplot2

You can manually calculate the kernel density esitmate and use the resulting data for better control over what you display on the plot by specifying the probability-breaks.

# calculate kde
kde <- MASS::kde2d(df$x, df$y, n = 100)

# process kde
dx <- diff(kde$x[1:2])
dy <- diff(kde$y[1:2])
sz <- sort(kde$z)
c1 <- cumsum(sz) * dx * dy
dimnames(kde$z) <- list(kde$x, kde$y)
dc <- melt(kde$z)
dc$prob <- approx(sz, 1 - c1, dc$value)$y

# set probability levels for plot
binwidth <- 1/15
prob <- c(0, binwidth, binwidth * 2, binwidth * 3, binwidth * 4)
# or prob <- c(0, 0.25, 0.5) for example


# plot with discrete levels
ggplot(dc, aes(x = Var1, y = Var2)) +
  geom_point(data = df, aes(x = x, y = y), alpha = 0.2) +
  geom_contour_filled(aes(z = prob, fill = after_stat(level)),
                      breaks = prob,
                      alpha = 0.9) +
  geom_contour(aes(z = prob),
               breaks = prob,
               color = "red",
               alpha = 0.9) +
  scale_fill_brewer(palette = "Blues",
                    direction = -1,
                    name = "probability") +
  labs(x = "x", y = "y")

# plot with continuous levels using level_low
ggplot(dc, aes(x = Var1, y = Var2)) +
  geom_point(data = df, aes(x = x, y = y), alpha = 0.2) +
  geom_contour_filled(aes(z = prob, fill = after_stat(level_high)),
                      breaks = prob,
                      alpha = 0.9) +
  geom_contour(aes(z = prob),
               breaks = prob,
               color = "red",
               alpha = 0.9) +
  labs(x = "x", y = "y", fill = "probability")

You had 15 bins in your original plot, so that is the base for the breaks in this example (1/15). The resulting plot will only show the top 4 contours.

enter image description here

enter image description here