How to compare GAMs that include random effects and factors?

514 Views Asked by At

I am trying to make a model comparison (say, for hypothesis testing) of two GAMs (mgcv package), where both models include random effects smooth term (s(bs="re")), and the second model additionally include a factor variable. So:

gm0 <- gam(y ~ s(subject, bs = "re"))
gm1 <- gam(y ~ fac + s(subject, bs = "re"))

For reference, I will use a corresponding pair of LMMs:

lmm0 <- lmer(y ~ (1 | subject))
lmm1 <- lmer(y ~ fac + (1 | subject))

As expected, the fixed and random effects parameters are very close between GAM and the LMM counterparts. The addition of fac improves the LMM fit, as it should be, but seemingly not for GAMs. Unlike in LMM, there is barely any difference in the reported likelihoods (and so AIC) between the two GAM models.

Therefore, my question is: how to compare the fit of such models?

Here's an example:

library(lme4)
library(mgcv)

# Data with one explanatory variable (2-level factor), and 'm' subjects:
n <- 100 # no. of observations
m <- 20  # no. of subjects
set.seed(666)
fac <- gl(2, n / 2, labels = LETTERS[1:2])
subject <- gl(m, n / m, labels = letters[1:m])
beta <- c(-1, 1)
y <- rnorm(n, beta[fac], .25) + rnorm(m, sd = 1.5)[subject]
## 

# LMM for comparison:
lmm0 <- lmer(y ~ (1 | subject), REML = FALSE)
lmm1 <- lmer(y ~ fac + (1 | subject), REML = FALSE)

# The two LMM differ by one *df* and the likelihood of
# the second model is higher:
anova(lmm0, lmm1)
## Models:
## lmm0: y ~ (1 | subject)
## lmm1: y ~ fac + (1 | subject)
##       npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
##  lmm0    3 126.49 134.30 -60.243   120.49                       
##  lmm1    4 123.86 134.28 -57.928   115.86 4.6311  1     0.0314 *

# Now equivalent models with GAM and RE-smooth term:
gm0 <- gam(y ~ s(subject, bs = "re"), method = "ML")
gm1 <- update(gm0, . ~ . + fac)

# Fixed effect coefficients, their SE, as well as RE variance components are
# almost identical between GAM and LMM:
# FX:
cbind(coef(gm1)[1:2], fixef(lmm1))
# SE:
cbind(sqrt(diag(vcov(gm1)[1:2, 1:2])),
      sqrt(diag(vcov(lmm1))))
# RE variance components:
VarCorr(lmm1)
gam.vcomp(gm1)


# In GAM, unlike in LMM, there is virtually no difference in likelihood of the
# two models in a pair.

AIC(lmm0, lmm1, gm0, gm1)
##        df   AIC
## lmm0  3.0 126.5
## lmm1  4.0 122.9
## gm0  20.9  30.7 <--
## gm1  20.9  30.7 <--

# Curiously, this happens when 'fac' spans 'subject's, i.e. no subject takes
# two different levels of 'fac'. Reference dfs in 'gm0' (the null model) is 19,
# i.e. number of subjects - 1. But in 'gm1', where 'fac' is included, ref. df.
# is 18.
table(fac, subject)
##    subject
## fac a b c d e f g h i j k l m n o p q r s t
##   A 5 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0
##   B 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5

summary(gm0)
##  Approximate significance of smooth terms:
##              edf Ref.df     F p-value    
## s(subject) 18.92     19 248.6  <2e-16 ***

summary(gm1)
##              edf Ref.df   F p-value    
## s(subject) 17.91     18 208  <2e-16 ***

# However, when 'fac' intersects 'subject's, i.e. when single subject can encompass
# different levels of 'fac', ref. df is again m - 1 = 19.

gm1b <- gam(y ~ sample(fac) + s(subject, bs = "re"))
summary(gm1b)
##              edf Ref.df     F p-value    
## s(subject) 18.92     19 246.3  <2e-16 ***

0

There are 0 best solutions below