How to force `stat_poly_line()` to use a specific non-zero y-intercept?

202 Views Asked by At

Using stat_poly_line() from package 'ggpmisc', one can fit a polynomial to data by default using lm() as method. You can force the fit through zero with either: formula = y ~ x + 0 or formula = y ~ x - 1. I cannot force it through a specific non-zero y-intercept for my linear model. In this case, I need to force it through 5.05.

Note: I recognize linear models are rarely statistically useful when the y-intercept is forced, but in my case I believe it is fine.

Here is my data:

mydata <- structure(list(y = c(20.2, 29.74, 22.37, 24.51, 
37.2, 31.43, 43.05, 54.36, 65.44, 67.28, 46.02), x = c(0.422014140000002, 
1.09152966, 1.3195521, 3.54231348, 2.79431778, 3.40756002, 5.58845772, 
7.10762298, 9.70041246, 11.7199653, 15.89668266)), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

And here is a simplified version of my plot:

myplot <- ggplot(mydata, aes(x = x, y = y)) +
  stat_poly_line(se = FALSE, 
                 linetype = "dashed", 
                 na.rm = TRUE, 
                 formula = y ~ x + 0) +
  stat_poly_eq(use_label(c("eq", "R2", "adj.R2")), 
               na.rm = TRUE, 
               formula = y ~ x + 0) +
  geom_point(size = 2.5) 

The x-value of the variable is 0 but I tried using 5.05 in that place to represent a y-intercept at 5.05 for the linear model (the x + 0 comes from the packages guide for how to put parabola intercepts at 0). This approach does not work, nor does using it on the y side of the formula either.

I could use another package relatively quickly, but I feel like there is a simple solution I can implement here.

Any help?

1

There are 1 best solutions below

1
Pedro J. Aphalo On BEST ANSWER

Interesting question! And you are correct, in that there is a solution within 'ggpmisc'. However, it may take a bit of familiarization before its feels simple...

stat_poly_line() by default uses lm() as method. So, as with lm() the straightforward way of doing what you want is subtracting 5.05 from all the y-values, fitting with formula = y ~ x + 0. The slope from the fit will be the one you want, and the intercept 5.05. So, you can use as formula = I(y - 5.05) ~ x + 0. To get the correct line plotted, the subtracted value needs to be added back to the predicted values, which are returned in y by the statistic. With the equation, some plotmath trickery needs to be used to edit the equation label returned by the statistic.

For the example below, I used 20 instead of 5.05 as it felt more reasonable for the example data you provided. (As a side note: se=TRUE could be used and valid, but would require adding the intercept, 20 in my exampele, to ymax and ymin in addition to to y.)

library(ggpmisc)
#> Loading required package: ggpp
#> Loading required package: ggplot2

mydata <- structure(list(Y = c(20.2, 29.74, 22.37, 24.51, 
                               37.2, 31.43, 43.05, 54.36, 65.44, 67.28, 46.02), 
                         X = c(0.422014140000002, 
                               1.09152966, 1.3195521, 3.54231348, 2.79431778, 
                               3.40756002, 5.58845772, 
                               7.10762298, 9.70041246, 11.7199653, 15.89668266)), 
                    row.names = c(NA, -11L), 
                    class = c("tbl_df", "tbl", "data.frame"))

myplot <- ggplot(mydata) +
  stat_poly_line(se = FALSE, 
                 linetype = "dashed",
                 na.rm = TRUE, 
                 mapping = aes(x = X, y = stage(start = Y, after_stat = y + 20)),
                 formula = I(y - 20) ~ x + 0) +
  stat_poly_eq(mapping = aes(X, Y, 
                    label = after_stat(paste(eq.label, "~+~20*\", \"*", rr.label))), 
               na.rm = TRUE, 
               orientation = "x", 
               formula = I(y - 20) ~ x + 0) +
  geom_point(mapping = aes(X, Y), size = 2.5) +
  ylab("y") +
  expand_limits(y = 0)

myplot

Created on 2023-10-13 with reprex v2.0.2