Hi I recently run into the problem of GLMM estimation, specifically, the bias of the variance components (and the fixed-effects) estimation using the PQL method. Two key references are Brewlow&Clayton 1993 and Lin&Breslow 1995, which both claims that PQL under-estimates the variance of the random-effect, see Section 4.3 and Table 1 of the first paper, and Eq (3) and Fig 4 of the second paper. Both papers are highly cited so this should be well accepted.
However when I tried to replicate the simulation, the results are totally different. I coded in R and used MASS::glmmPQL(). Below are some quick code.
## Brewlow&Clayton 1993, Table 1, D=D1, m=1
require(MASS)
require(nlme)
require(lme4)
set.seed(123)
sim.breslow <- matrix(NA, 200, 4+1)
expit <- function(a) 1/(1+exp(-a))
K = 100 # 100 clusters
nn = rep(7, K) # size of each cluster
N = sum(nn)
pid <- rep(1:K, nn) # cluster id
xx = rep(c(1,0), each=K*7/2)
tt = rep(seq(-3,3), K)
X = cbind(tt, xx, tt*xx)
for(ii in 1:200){ # 200 reps
g.t <- rnorm(K, 0, 1) # random intercept
lp = -2.5 + X %*% c(1,-1, -0.5) + rep(g.t, each=7) # linear predictor
y = rbinom(N, 1, expit(lp))
dd = data.frame(y,X,pid=pid)
names(dd)[2:4] <- c('tt', 'xx', 'xt')
fit.pql = tryCatch(glmmPQL(fixed = y ~ tt+xx+xt, random = ~1|pid, family = 'binomial', data = dd, verbose=F),
error = function(e) NA)
if(length(fit.pql)!=1){
b.pql <- fit.pql$coef$fixed
D.pql <- as.numeric(VarCorr(fit.pql)[1,1])
}else{
b.pql <- rep(NA, 4)
D.pql <- rep(NA, 1)
}
sim.breslow[ii,] <- c(b.pql, D.pql)
cat('.')
}
res = apply(sim.breslow, 2, mean)
res = cbind(c(-2.5, 1,-1, -0.5, 1), res)
row.names(res) <- c('a0', 'a1', 'a2', 'a3', 's00')
colnames(res) <- c('true', 'PQL')
round(res, 2)
# true PQL
# a0 -2.5 -2.61
# a1 1.0 1.04
# a2 -1.0 -0.93
# a3 -0.5 -0.52
# s00 1.0 1.72
Notice that the var component s00 is over-estimated (1.72 compared to the truth 1.0), this is different from the paper claims (under-estimated as 0.68 while the truth is 1.0). I also tested other settings in the two papers and the same conclusion, I can post more simulation code if necessary. I checked the source code of glmmPQL and seems no problem.
So my questions are
- Why glmmPQL result differs from the conclusion the references claimed? Did I miss anything?
- Are there any other functions I can use to cross-validate? MASS::glmmPQL is the only one I can find to do PQL.
- BTW, anyone aware of any packages that do the bias correction of Lin&Breslow 1995?
Thanks!