Error getting equation from stat_poly_eq (possibly due to layer specific data)

58 Views Asked by At

I'm trying to get the equation for predicted data from a logit model, graphed over the real data.

Using stat_poly_eq (from ggpmisc) works, until I ask it to show the equation.

Here is my function:

plot_logit_lab = function(log_mod){
  
  mod_frame = model.frame(log_mod)
  var_names = names(mod_frame)
  
  newdat = setNames(data.frame(seq(min(mod_frame[[2]]), max(mod_frame[[2]]), 
                                   len=100)),
                    var_names[2])

  newdat[var_names[1]] = predict(log_mod, newdata = newdat, type="response")
  
  the_plot <- ggplot() +
    geom_point(data=mod_frame, aes(x=mod_frame[[2]],
                                   y=mod_frame[[1]])) +   
    stat_poly_line(data=newdat, mapping=aes(x=newdat[[1]], 
                                            y=newdat[[2]])) +
    stat_poly_eq(data=newdat, mapping=aes(x=newdat[[1]], 
                                          y=newdat[[2]]),
                 use_label(labels=c("eq")))
  
  return(the_plot)
}

And the error I get trying to run it:

> Error in `check_subclass()`:
! `x` must be either a string or a <Geom> object, not a <uneval> object.
> Backtrace: 
> 1. global plot_logit_lab(car_logit)
> 2. ggpmisc::stat_poly_eq(...)
> 3. ggplot2::layer(...)
> 4. ggplot2:::check_subclass(x = geom)

Without the use_label argument, it works fine (showing an R2 value of 1 because of the predicted values) but the equation is what I'm interested in.

Also, here is my test data and how I'm using the function:

car <- cars %>%
  mutate(d = case_when(dist < 50 ~ "less",
                        dist >= 50 ~ "more"),
         s = speed)

car$d <- factor(car$d, levels=c("less","more"))

car_sum <- car %>%
  count(s, d) %>%
  group_by(s) %>%
  mutate(s_n = sum(n),
         d_prob = n/sum(n))

car_logit <- glm(formula = d_prob ~ s,
                weight = s_n,
                family = binomial(link = "logit"),
                data = car_sum)

car_plot <- plot_logit_lab(car_logit)

car_plot

image showing Error in check_subclass() from ggpmisc attempt

The data_logit_no_out function is basically the same as the car_sum step above.

2

There are 2 best solutions below

0
M-- On

I am not sure if you can use data = ... and use_label within stat_poly_eq at the same time (could be a bug, or something that I am missing at the moment). But in any case, if we shuffle things around, define the data within ggplot() and add the points using geom_point with another dataframe at the end, we can get the desired result.

library(ggplot2); library(ggpmisc); library(dplyr)

plot_logit_lab = function(log_mod){
  mod_frame = model.frame(log_mod)
  var_names = names(mod_frame)
  newdat = setNames(data.frame(seq(min(mod_frame[[2]]), max(mod_frame[[2]]), 
                                   len=100)), var_names[2])
  
  newdat[var_names[1]] = predict(log_mod, newdata = newdat, type="response")
  

  the_plot <- ggplot(data=newdat, aes(x=newdat[[1]], y=newdat[[2]])) +
    stat_poly_line() +
    stat_poly_eq(use_label(c("eq"))) +
    geom_point(data=mod_frame, aes(x=mod_frame[[2]], y=mod_frame[[1]])) +
    labs(x = names(newdat)[[1]], y = names(newdat)[[2]])
      
  return(the_plot)
}

car_logit <- cars %>%
  mutate(d = as.factor(ifelse(dist < 50, "less", "more")),
         s = speed) %>%
  count(s, d) %>%
  group_by(s) %>%
  mutate(s_n = sum(n), d_prob = n/sum(n)) %>% 
  glm(formula = d_prob ~ s,
      weight = s_n,
      family = binomial(link = "logit"),
      data = .)

car_plot <- plot_logit_lab(car_logit)
suppressWarnings(print(car_plot))

Created on 2024-02-19 with reprex v2.0.2

1
Pedro J. Aphalo On

(I am not sure what is the intention of your code, as you are fitting a linear regression with lm() to the prediction from a glm() fit.)

I answer below your question about why use_label() gives an error, and explain why "shuffling things around" worked in the previous answer.

The problem is that the value returned by use_label() is a mapping to a computed value and must be passed as an argument to parameter mappping. Function use_label() has a second argument other.mapping for cases like this, making it possible to concatenate other mappings to be passed as argument to mapping (see the edited function definition below).

Function use_label() is only for convenience, so alternatively, adding label = after_stat(eq_label) within the call to aes() could be used instead of the call to use_label(). (Not shown.)

The answer by @M-- solves the problem by moving the mapping of other variables to the call to ggplot(), and passing the value returned by use_label() by position to the mapping parameter.

plot_logit_lab = function(log_mod){
  
  mod_frame = model.frame(log_mod)

  var_names = names(mod_frame)
  
  newdat = setNames(data.frame(seq(min(mod_frame[[2]]), max(mod_frame[[2]]), 
                                   len=100)),
                    var_names[2])
  newdat[var_names[1]] = predict(log_mod, newdata = newdat, type="response")
  
  the_plot <- ggplot() +
    geom_point(data=mod_frame, aes(x=mod_frame[[2]],
                                   y=mod_frame[[1]])) +   
    stat_poly_line(data=newdat, mapping=aes(x=newdat[[1]], 
                                            y=newdat[[2]])) +
    stat_poly_eq(data = newdat, 
                 mapping = use_label(labels = c("eq"), 
                                     other.mapping = aes(x=newdat[[1]], 
                                                         y=newdat[[2]])))
  
  return(the_plot)
}

I haven't editted the code, but the mapping should normally not be done with the extraction operator.