geom_smooth line different color based on Y intercept line

81 Views Asked by At

My data is like below. Two column, serial number (SL) and the expression value (log)

> df

SL  log
1   1.5
2   -2.5
3   1.0
4   2.5
5   -1.

> ggplot(df, aes(x = SL, y = log)) +
  geom_point(size = 0.5, alpha = 0.6, shape = 19, color = "gray") +
  geom_smooth(method = "loess", se = FALSE, linewidth = 0.5, span = 0.09) +
  geom_hline(yintercept = 0, color = "black", lwd = 0.5)

enter image description here

However, I want to fill the geome_smooth loess line to be filled with red color if it is above the Y=0 intercept line and green color if below Y=0 intercept line. Example figure below.

enter image description here

How can I do that?

3

There are 3 best solutions below

1
Jon Spring On

I don't think there's a super simple answer, since you first need to extract the loess line to use it with geom_area (which can't vary in fill) or geom_ribbon (which won't automatically separate regions of the same sign).

I like using ggbraid once we've extracted the loess line, as it will make a nice clean interpolation with separate fill regions.

Given some fake data:

df <- data.frame(x = 1:100, 
                 y = sin(seq(0,8, length.out = 100)) + sin(1:100))

We could make a plot and extract the loess curve using the approach here:

p <- ggplot(df, aes(x, y)) +
  geom_point() +
  geom_smooth()
df2 <- ggplot_build(p)[[1]][[2]][,c("x","y")]

Then we have a slightly tricky problem involving interpolation, as detailed here and here. One simple solution is to use the ggbraid package, which identifies the exact crossing points and cleanly separates the fill areas.

# using ggbraid from remotes::install_github("nsgrantham/ggbraid")
ggplot(df, aes(x, y)) +
  ggbraid::stat_braid(aes(x, ymin = 0, ymax = y, fill = y < 0),
                      data = df2) +
  geom_point() +
  geom_line(data = df2) +
  geom_hline(yintercept = 0) +
  scale_fill_manual(values = c("red", "green"))

enter image description here

0
Jishan On

Thanks for all your answers. Following code worked perfectly:

smoothed_values <- predict(loess(log ~ SL, data = df, span = 0.09)) 
fill_color <- ifelse(smoothed_values >= 0, "red", "green")

plot <- ggplot(df, aes(x = SL, y = log)) +
  geom_point(size = 0.5, alpha = 0.6, shape = 19, color = "gray") +
  ggbraid::geom_braid(aes(SL, ymin = 0, ymax = smoothed_values, fill = fill_color), 
                      alpha = 0.5, lwd = 0.5, color = "gray3") +
  geom_hline(yintercept = 0, lwd = 0.25, color = "black") +
  labs(x = "Gene Order", y = "logFC") +
  scale_fill_identity(name = "Gene Expression", 
                      labels = c("Down", "Up"), guide = "legend", 
                      breaks = c("green", "red"), limits = c("green", "red")) +
  theme_minimal() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black")) +
  scale_y_continuous(breaks = seq(-5, 5, by = 0.5))

enter image description here

0
M-- On

Here's an approach by interpolating the values between the rows where sign(y) changes, then creating two groups for stat_smooth data, one for positive and one for negative values, and finally recreating the plot with the new data.

First, we need to make a plot with the smoothed data.

library(ggplot2)
library(data.table)
# library(tidyr) ## we only need tidyr::fill
## example dataset
set.seed(123)
df.ex <- data.frame(s = 1:600, l = round(rnorm(600, sd = 2), 1))

## creating the first plot with stat_smooth; same fill color for + & -
ggplot(df.ex, aes(x = s, y = l)) +
  geom_point(size = 0.5, alpha = 0.6, shape = 19, color = "gray") +
  geom_hline(yintercept = 0, color = "black", lwd = 0.5) +
  stat_smooth(method = "loess", geom = "area", fill = "pink",
              se = FALSE, linewidth = 0.5, span = 0.09, 
              color = "blue",  alpha = 0.8) -> p; p

Then we extract the "smoothed data" from ggplot2 object, will find the x coordinates of where geom_smooth goes from positive y to negative or vice versa (i.e. y == 0), and will modify the data for the final plot.

## ggplot build to extract the smooth data
q <- ggplot_build(p)
#> `geom_smooth()` using formula = 'y ~ x'

smdat <- copy(q[[1]][[3]])
setDT(smdat)

## set the fill colors and group for negative and positive values
smdat[ , group := ifelse(y > 0 , 1, 2)][, fill := ifelse(y < 0 , "green", "red")]

## interpolation; taken from https://stackoverflow.com/a/27137211/6461462
smdat_grp <- smdat[ , {
  ix = .I[c(FALSE, abs(diff(sign(smdat$y))) == 2)]
  if(length(ix)){
    pred_x = sapply(ix, function(i) approx(x = y[c(i-1, i)], 
                                           y = x[c(i-1, i)], xout = 0)$y)
    rbindlist(.(.SD, data.table(x = pred_x, y = 0, group = 1, ymax = 0),
                     data.table(x = pred_x, y = 0, group = 2, ymax = 0)), 
              fill = TRUE)} else .SD}][order(x)]

## filling the NA values in the remaining columns 
lapply(split(smdat_grp, smdat_grp$group), \(dat) 
       tidyr::fill(dat, everything(), .direction = "downup")) |> 
    rbindlist() |> 
    as.data.frame() -> q[[1]][[3]]

## plot the modified ggplot
pq <- ggplot_gtable(q)

plot(pq)

As you can see, using ggbraid::geom_braid or ggbraid::stat_braid, although very close, gives us slightly different results from what we get from stat_smooth/geom_smooth.

smoothed_values <- predict(loess(l ~ s, data = df.ex, span = 0.09)) 
fill_color <- ifelse(smoothed_values >= 0, "red", "green")

ggplot(df.ex, aes(x = s, y = l)) +
  geom_point(size = 0.5, alpha = 0.6, shape = 19, color = "gray") +
  
  ggbraid::geom_braid(aes(s, ymin = 0, ymax = smoothed_values, fill = fill_color), 
                      alpha = 0.5, lwd = 0.5, color = "gray3") +
  geom_hline(yintercept = 0, lwd = 0.25, color = "black") +
  geom_smooth(method = "loess", se = FALSE, linewidth = 1, span = 0.09) +
  theme(legend.position = "none") +
  coord_cartesian(ylim = c(-1, 1)) ## zoom to see the difference
#> `geom_braid()` using method = 'line'
#> `geom_smooth()` using formula = 'y ~ x'

Created on 2024-03-07 with reprex v2.0.2