Is there a way to calculate standard errors for predictions using the geeglm function from geepack?

73 Views Asked by At

I want to visualize the results of my model. I do not want to obtain predictions for new individuals.

When I try to use the argument se.fit = TRUE in the predict function with a GEE fitted with geepack::geeglm() I get an error: Error in scale^2 : non-numeric argument to binary operator

Minimum reproducible example of the error:

Note: I am using here an autoregressive correlation structure because in my real data that's the right one to use.

library(geepack)

rm(list = ls())

data(mtcars)
mtcars$am <- as.character(mtcars$am)
mtcars <- mtcars[order(mtcars$cyl, mtcars$disp),]
m1 <- geeglm(mpg ~ disp * am, data = mtcars, id = cyl, corstr = "ar1")
summary(m1)

newd <- expand.grid(disp = seq(min(mtcars$disp), max(mtcars$disp), 1), am = c("0", "1"))

newd$pred <- predict(m1, newdata = newd, type = "response", se.fit = TRUE)

# Error in scale^2 : non-numeric argument to binary operator

At first, I thought I might be a feature, not a bug, since it could be uncommon to want to calculate standard errors for predictions when GEEs give you population estimates. But I found that the package glmtoolbox does give this standard errors.

Another example with glmtoolbox:

library(glmtoolbox)

rm(list = ls())

data(mtcars)
mtcars$am <- as.character(mtcars$am)
mod1 <- mpg ~ disp * am
mtcars <- mtcars[order(mtcars$cyl, mtcars$disp),]
m1 <- glmgee(mod1, id=cyl, family=gaussian(), data=mtcars, corstr="AR-M-dependent")
summary(m1)
newd <- expand.grid(disp = seq(min(mtcars$disp), max(mtcars$disp), 1), am = c("0", "1"))


newd$fit    <- as.data.frame(predict(m1, newdata = newd, type = "response", se.fit = TRUE))$fit
newd$se.fit <- as.data.frame(predict(m1, newdata = newd, type = "response", se.fit = TRUE))$se.fit


ggplot(newd, aes(x = disp, y = fit, color = am)) +
  geom_line() +
  geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
                  ymax = fit + 1.96 * se.fit, fill = am), alpha = 0.2) +
  ylab("fit (mpg)") +
  scale_y_continuous(limits = c(-5,35)) +
  theme_light()

This returns the kind of plot I want to visualize what the model is saying:

visualization of the glmgee model results

The problem with this approach is that the results are not the exact same results I get with geepack::geeglm().

From geepack::geeglm():

geeglm(formula = mpg ~ disp * am, data = mtcars, id = cyl, corstr = "ar1")

 Coefficients:
            Estimate  Std.err   Wald Pr(>|W|)    
(Intercept) 26.70389  1.98299  181.3  < 2e-16 ***
disp        -0.03235  0.00571   32.1  1.5e-08 ***
am1          5.04045  0.13026 1497.4  < 2e-16 ***
disp:am1    -0.01908  0.00374   26.0  3.4e-07 ***

And from glmtoolbox::glmgee():

Coefficients
            Estimate Std.Error z-value Pr(>|z|)
(Intercept)   26.026     1.691  15.391  < 2e-16
disp          -0.030     0.005  -6.333 2.40e-10
am1            6.207     0.296  20.982  < 2e-16
disp:am1      -0.024     0.006  -4.191 2.78e-05

In summary:

  • Any source for the differences between glmtoolbox::glmgee() and geepack::geeglm()?
  • Is there any way to calculate standard errors for geepack::geeglm()? I have been using that function for a long time and I am more comfortable with it.

I have tried different correlation structures, both functions return the same result with "independence" so the difference has to be either in how they deal with the correlation structure or somehow about the ordering of the data.

I have also adjusted the same concept using a mixed model, the results are of course similar but the confidence intervals are much wider (I could use some insight with this, I thought they would be narrower). I can provide the MRE for the mixed model if anyone wants to see it.

0

There are 0 best solutions below