I used functions ggpredict()
and ggemmeans()
from package ggeffects
1.3.0 to calculate mean estimates and confidence intervals (hereafter: CI) for a mixed-effect model. These functions rely on predict()
and on emmeans()
and make their outputs ggplot-friendly. The values predicted/estimated by the two functions differ both in their mean values and in their CI. Why?
The following reproducible example is based on dataset RIKZ (Janssen e Mulder 2005; Zuur et al. 2007), which looks at how species richness (number of species) varies with the height of sampling stations compared to mean tidal level (NAP, in meters) and the level of exposure (factor with three levels: low, medium and high):
rm(list=ls())
if (!require(pacman)) install.packages('pacman'); library(pacman)
p_load(emmeans)
p_load(ggplot2)
p_load(ggpubr)
p_load(ggeffects)
p_load(lme4, lmerTest, glmmTMB)
p_load(RCurl)
# get data:
RIKZ <- read.csv(text = RCurl::getURL(
"https://raw.githubusercontent.com/marcoplebani85/datasets/master/RIKZ.csv"))
str(RIKZ)
# "Exposure" is a factor:
RIKZ$Exposure <- as.factor(RIKZ$Exposure)
Here I fit a generalised mixed-effect model with Poisson-distributed residuals to the data using glmmTMB()
:
mem1 <- glmmTMB(Richness ~ NAP+Exposure + (1 | Beach),
family="poisson",
data = RIKZ, REML=T)
Model predictions and CI according to ggeffects::ggpredict()
, without accounting for the uncertainty of random effects (see this page on why accounting for it or not):
richness.predicted <- ggpredict(mem1,
terms=c("NAP", "Exposure"), type="fixed")
Predictions and CI for the same model according to ggeffects::ggemmeans()
, without accounting for the uncertainty of random effects:
richness.emmeans <- ggemmeans(mem1,
terms=c("NAP", "Exposure"), type="fixed")
Graphs representing the two sets of model predictions and CI together with the observed data:
p1 <- plot(richness.predicted, add.data=T) +
labs(title="Predictions by ggpredict()") + ylim(0,45) +
theme(text = element_text(size = 15))
p2 <- plot(richness.emmeans, add.data=T) +
labs(title="Predictions by ggemmeans()") + ylim(0,45) +
theme(text = element_text(size = 15))
ggarrange(p1, p2, ncol=2, labels=c("(a)", "(b)"),
common.legend=T, legend="bottom")
Why do the two sets of mean estimates and confidence intervals differ?
UPDATE:
This page explains the differences between the two functions:
[...] effects returned by
ggpredict()
[are] conditional effects (i.e. these are conditioned on certain (reference) levels of factors), whileggemmeans()
returns marginal means, since the effects are “marginalized” (or “averaged”) over the levels of factors. However, these differences only apply to non-focal terms, i.e. remaining variables that are not specified in the terms argument.
It continues:
When all categorical predictors are specified in terms and further (non-focal) terms are only numeric, results are also identical (as both
ggpredict()
andggemmeans()
use the mean value by default to hold non-focal numeric variables constant).
I suspect the reason for the differences in the predictions obtained by the two functions for mixed-effect models lies in how they handle random effects, but I don't know how.
REFERENCES:
- Janssen, G. and Mulder, S. (2005), Zonation of macrofauna across sandy beaches and surf zones along the Dutch coast. Oceanologia 47(2): 265-282.
- Zuur, A.F., Ieno, E.N., and Smith, G.M. (2007), Analysing ecological data. Springer Science & Business Media
- https://strengejacke.github.io/ggeffects/articles/introduction_randomeffects.html
- https://strengejacke.github.io/ggeffects/articles/technical_differencepredictemmeans.html
This really a comment, not a full answer, but perhaps it could point into the right direction to understand this subtle difference between
ggpredict
andggemmeans
which is actually a difference betweenpredict.glmmTMB
andemmeans
.glmmTMB
uses two estimation methods: ML (maximum likelihood) and REML (restricted maximum likelihood). One chooses one or the other method by setting the argumentREML
appropriately.If the estimation method is ML, then
ggpredict
andggemeans
give the same result. The difference occurs when the estimation method is REML and the model has random effects.For comparison I also fitted the Poisson GLMM with
lme4::glmer
. This function doesn't have amethod
option and uses ML. Again, no difference.