I want to visualize the results of my model. I do not want to obtain predictions for new individuals.
When I try to use the argument se.fit = TRUE
in the predict
function with a GEE fitted with geepack::geeglm()
I get an error: Error in scale^2 : non-numeric argument to binary operator
Minimum reproducible example of the error:
Note: I am using here an autoregressive correlation structure because in my real data that's the right one to use.
library(geepack)
rm(list = ls())
data(mtcars)
mtcars$am <- as.character(mtcars$am)
mtcars <- mtcars[order(mtcars$cyl, mtcars$disp),]
m1 <- geeglm(mpg ~ disp * am, data = mtcars, id = cyl, corstr = "ar1")
summary(m1)
newd <- expand.grid(disp = seq(min(mtcars$disp), max(mtcars$disp), 1), am = c("0", "1"))
newd$pred <- predict(m1, newdata = newd, type = "response", se.fit = TRUE)
# Error in scale^2 : non-numeric argument to binary operator
At first, I thought I might be a feature, not a bug, since it could be uncommon to want to calculate standard errors for predictions when GEEs give you population estimates. But I found that the package glmtoolbox
does give this standard errors.
Another example with glmtoolbox
:
library(glmtoolbox)
rm(list = ls())
data(mtcars)
mtcars$am <- as.character(mtcars$am)
mod1 <- mpg ~ disp * am
mtcars <- mtcars[order(mtcars$cyl, mtcars$disp),]
m1 <- glmgee(mod1, id=cyl, family=gaussian(), data=mtcars, corstr="AR-M-dependent")
summary(m1)
newd <- expand.grid(disp = seq(min(mtcars$disp), max(mtcars$disp), 1), am = c("0", "1"))
newd$fit <- as.data.frame(predict(m1, newdata = newd, type = "response", se.fit = TRUE))$fit
newd$se.fit <- as.data.frame(predict(m1, newdata = newd, type = "response", se.fit = TRUE))$se.fit
ggplot(newd, aes(x = disp, y = fit, color = am)) +
geom_line() +
geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
ymax = fit + 1.96 * se.fit, fill = am), alpha = 0.2) +
ylab("fit (mpg)") +
scale_y_continuous(limits = c(-5,35)) +
theme_light()
This returns the kind of plot I want to visualize what the model is saying:
visualization of the glmgee model results
The problem with this approach is that the results are not the exact same results I get with geepack::geeglm()
.
From geepack::geeglm()
:
geeglm(formula = mpg ~ disp * am, data = mtcars, id = cyl, corstr = "ar1")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 26.70389 1.98299 181.3 < 2e-16 ***
disp -0.03235 0.00571 32.1 1.5e-08 ***
am1 5.04045 0.13026 1497.4 < 2e-16 ***
disp:am1 -0.01908 0.00374 26.0 3.4e-07 ***
And from glmtoolbox::glmgee()
:
Coefficients
Estimate Std.Error z-value Pr(>|z|)
(Intercept) 26.026 1.691 15.391 < 2e-16
disp -0.030 0.005 -6.333 2.40e-10
am1 6.207 0.296 20.982 < 2e-16
disp:am1 -0.024 0.006 -4.191 2.78e-05
In summary:
- Any source for the differences between
glmtoolbox::glmgee()
andgeepack::geeglm()
? - Is there any way to calculate standard errors for
geepack::geeglm()
? I have been using that function for a long time and I am more comfortable with it.
I have tried different correlation structures, both functions return the same result with "independence" so the difference has to be either in how they deal with the correlation structure or somehow about the ordering of the data.
I have also adjusted the same concept using a mixed model, the results are of course similar but the confidence intervals are much wider (I could use some insight with this, I thought they would be narrower). I can provide the MRE for the mixed model if anyone wants to see it.