How to extract the quadratic fit from stat_smooth when using a transformed axis

42 Views Asked by At

I am trying to extract the line for my quadratic fit so that I can work out the x coordinate at y = 0.1 but the lm() and the stat_smooth are giving me different fits because of the log transformation on my y axis.

Minimum dataset:

> xvals <- c(0,1,2,2.5,3,5,7.5,10)
> yvals <- c(1,0.65,0.425,0.45,0.26,0.085,0.0121667,0.000675)
> df <- cbind.dataframe(xvals,yvals)
> 
> ggplot(df, aes(x = xvals, y = yvals))+
> stat_smooth(method= 'lm', formula = y~poly(x,2), se = F, color = 'blue')+
> stat_function(fun=function(x) 0.93389 + -0.25608 *x + 0.01663*x^2, color = 'green')+
> geom_point(color = 'blue', shape = 22, size = 3)+
> geom_hline(yintercept = 0.1)+
> geom_vline(xintercept = 4.675868)+
> scale_y_continuous(trans = 'log10')

enter image description here

The stat_function coordinates are derived from an lm (I have also done poly(x,2) raw = T and it gives the same coordinates.

> lm(df, formula = yvals ~ I(xvals^2) + I(xvals))

and I have drawn the black lines by solving the equation so when y = 0.1 the lm gives me that x = 4.7. However as you can see this does not line up with the stat_smooth line because the equation is slightly differnet.

If I take the log transformation off the y axis, the lines are drawn on top of each other and the stat_smooth line crosses at 4.7.

My understanding from some searching is that this is caused by ggplot transforming the data before it does the polynomial fit unless I've just made a huge blunder in which case any help is welcomed.

Does anyone know how I can either transform the data I put into my lm or derive the equation stat_smooth is using to get the coordinates where the stat_smooth crosses y = 0.1.

1

There are 1 best solutions below

5
asd-tm On

The issue is that you are using log10 trans that is (if not modified) undefined for negative values. Trans substitutes fun return values in stat_function() with log10(0.93389 + -0.25608 *x + 0.01663*x^2). You dont have negative yvals but you have negative fitted values for 0.93389 + -0.25608 *x + 0.01663*x^2. See this sample:

ggplot(df, aes(x = xvals, y = yvals))+
stat_smooth(method= 'lm', formula = y~poly(x,2), se = F, color = 'blue')+
  stat_function(fun=function(x) log10(0.93389 + -0.25608 *x + 0.01663*x^2), color = 'green')+
geom_point(color = 'blue', shape = 22, size = 3)+
geom_hline(yintercept = 0.1)+
geom_vline(xintercept = 4.675868)

enter image description here

If you replace trans = 'log10' with trans = 'pseudo_log' you get matching curves:

ggplot(df, aes(x = xvals, y = yvals))+
stat_smooth(method= 'lm', formula = y~poly(x,2), se = F, color = 'blue')+
  stat_function(fun=function(x) 0.93389 + -0.25608 *x + 0.01663*x^2, color = 'green')+
geom_point(color = 'blue', shape = 22, size = 3)+
geom_hline(yintercept = 0.1)+
geom_vline(xintercept = 4.675868)+
scale_y_continuous(trans = 'pseudo_log')

enter image description here

FURTHER DEVELOPMENT OF ANSWER

Find my considerations in comments:

  # find coefficients of lm and rename them for easier readibility 
  cmdl <- lm(df, formula = yvals ~ I(xvals^2) + I(xvals)) %>% 
    coef() %>% 
    `names<-`(c("c","a", "b"))
    
  y <- 0.1 # your desired y value for which we are looking equation roots (x'es) 
  
  # as y = ax^2 + bx + c => ax^2 + bx + (c - y) = 0
  cmdl[['c']] <- cmdl[['c']] - y
  
  # solve quadratic equation (note: without any a == 0 check) and get two roots for the specific case
  x <- (-1*cmdl[['b']]+ c(1, -1) * sqrt(cmdl[['b']]^2 - 4 * cmdl[['a']] * cmdl[['c']])) / 
    (2*cmdl[['a']])
   
   # x values ==
   [1] 10.725647  4.675868

   # And plot the result
   ggplot(df, aes(x = xvals, y = yvals))+
   stat_smooth(method= 'lm', formula = y~poly(x,2), se = F, color = 'blue')+
   stat_function(fun=function(x) 0.93389 + -0.25608 *x + 0.01663*x^2, color = 'green')+
   geom_point(color = 'blue', shape = 22, size = 3)+
   geom_hline(yintercept = 0.1)+
   geom_vline(xintercept = 4.675868)+
     geom_vline(xintercept = x, color = "orange", lty = 2)

enter image description here