Random effects from Intercepts and Slopes as Outcomes model

116 Views Asked by At

I am trying to reproduce with R the results from Intercepts- and Slopes-as Outcomes model from Hierarchical Linear Models by Raudenbush and Bryk, which deals with High School and Beyond dataset.

Model and results from Raudenbush book

The hsb dataset comes from the mlmhelpr package and my R code is:

library(lme4)
library(mlmhelpr)

hsb=hsb
groupMeanses = aggregate(hsb$ses, list(hsb$id), FUN = mean, data=hsb)
names(groupMeanses) = c('id','groupMeanses')
hsb = merge(hsb, groupMeanses, by = c('id'))
hsb$ses_centered_a = (hsb$ses - hsb$groupMeanses)
hsb$ses_centered_b = (hsb$ses - hsb$meanses)

M3d = lmer(mathach ~ 1 + ses_centered_b * meanses + ses_centered_b * catholic 
      + (1+meanses|id) , REML = F, data=hsb, 
      control = lmerControl(optimizer = "bobyqa", boundary.tol = 1e-5))

summary(M3d)

and the results from it:

R lme4 code results

As you can see, both results (those from Raudenbush and Bryk book and my R code) look pretty close, except for the random effects. I found a tutorial on Internet that fits the same model using HLM 6 that reproduces the results correctly, so I must be doing something wrong in R.

HLM6 results

My question is: what modification should I do in my code or what is the right way in R to get the right results?

2

There are 2 best solutions below

0
Ben Bolker On

It's hard to be absolutely sure, but my best guess is that the various packages in R (I tried several; see below) are coming up with better fits to the data than HLM6. In particular, the HLM6 output you list reports a deviance (in this case, -2 times the log-likelihood) of 46501.875643; several different R packages report deviances of 46497.4 (about 4.4 deviance units lower). Lower deviances are better, so it seems that R's fits are actually better.

In a little more detail: I fitted the same model with lme4::lmer, nlme::lme, and glmmTMB::glmmTMB, which all implement mixed model fitting in (slightly) different ways. They all found singular fits, with small or minimal slope standard deviations (from 6e-6 to 0.05), highly variable correlation between slopes and intercepts (which makes sense — as the slope SD goes to zero, the slope-intercept correlation becomes undefined). The deviances and estimates of intercept SD and residual SD were highly consistent.

  package sd.(Intercept)           cor   sd.meanses sd.Observation deviance
1    lmer       1.520733 -1.0000000000 5.341325e-02       6.062259 46497.41
2     lme       1.520815 -0.0001895723 2.457569e-03       6.062254 46497.44
3 glmmTMB       1.520813  0.9723940794 5.643373e-06       6.062255 46497.44

I also used the allFit() function to try a wide range of optimizers within lmer - these all gave consistent answers.

The next test would be to take the estimates from HLM6, evaluate the log-likelihoods from the various R packages (this is fairly straightforward with lmer and glmmTMB), and see if we recover the HLM6 deviance estimate. Unfortunately, HLM6 doesn't appear to report the intercept-slope covariance (or correlation).

library(lme4)
library(nlme)
library(glmmTMB)

M3d = lmer(mathach ~ 1 + ses_centered_b * meanses + ses_centered_b * catholic 
      + (1+meanses|id) , REML = FALSE, data=hsb, 
      control = lmerControl(optimizer = "bobyqa", boundary.tol = 1e-5))

M3e = glmmTMB(mathach ~ 1 + ses_centered_b * meanses + ses_centered_b * catholic 
      + (1+meanses|id) , REML = FALSE, data=hsb)

M3f = lme(mathach ~ 1 + ses_centered_b * meanses + ses_centered_b * catholic,
          random = ~ 1+meanses|id , method = "ML", data=hsb)


library(broom.mixed)
library(tidyverse)

fitlist <- list(lmer = M3d, lme = M3f, glmmTMB = M3e)

(fitlist
    |> purrr::map_dfr(tidy, effect = "ran_pars", .id = "package")
    |> select(package, term, estimate)
    |> mutate(across(term, ~gsub("_+",".", .)))
    |> mutate(across(term, ~gsub("cor.*", "cor", .)))
    |> pivot_wider(names_from = "term", values_from = "estimate")
    |> mutate(deviance = c(deviance(fitlist$lmer), deviance(fitlist$lme), 2*M3e$obj$fn()))
    |> as.data.frame()
)

aa <- allFit(M3d)

(tidy(aa, effect = "ran_pars")
    |> select(optimizer, term, estimate)
    |> pivot_wider(names_from = "term", values_from = "estimate")
    |> as.data.frame()
)
2
efiguero On

The following code fixes the issue of boundary (singular) fit and also the variance components look more informative:

library(lme4)
library(mlmhelpr)

hsb=hsb

groupMeanses <- aggregate(hsb$ses, list(hsb$id), FUN = mean, data=hsb)
names(groupMeanses)<- c('id','groupMeanses')
hsb <- merge(hsb, groupMeanses, by = c('id'))

hsb$ses_centered_a = (hsb$ses - hsb$groupMeanses)

M3h2 = lmer(mathach ~ 1 + ses_centered_a * meanses 
            + ses_centered_a * catholic       
            + (1+ses_centered_a|id), 
            REML = T, data=hsb)

summary(M3h2)

Apart from fitting the model with REML, the improved code uses the ses column centered around its high school mean (ses_centered_a), instead of ses centered around meanses (ses_centered_b), both for the fixed effects and random effects. Furthermore, in the R code that originated the current question, the meanses column was used in the random effects part, not ses_centered_a as stated before.

new code