Inconsistent predictions from lmList and data.table

97 Views Asked by At

I'm working on generating predictions from an lmList object holding some variables constant, but am finding inconsistent predictions between using a full cartesian product of data and using only a few rows. It doesn't seem to matter if I use lme4 or nlme versions of lmList.

library(data.table)
library(boot)
library(lme4)

set.seed(31)

# SET PRIMARY ATTRIBUTES
x2 <- c(-1.5800,-1.0660,-0.7640,-0.4970,-0.2395,0.0250,0.3140,0.6535,1.1350,2.0750)
x1 <- c(-1, 0, 1)
grp <- factor(letters[1:5])

# SET MEANS FOR EACH GROUP
y_grp_mn <- runif(5, min = 0.3, max = 0.6)

# EXPAND GRID
dt <- expand.grid(x1, x2, grp)
names(dt) <- c("x1", "x2", "grp")

# EXPAND DT
dt <- rbind(dt, dt, dt, dt, dt)

# SET BINARY OUTCOME WITH y_grp_mn PROBABILITIES
dt$y <- as.numeric(lapply(dt$grp, function(x) { 
  ifelse(rnorm(1, mean = y_grp_mn[as.numeric(x)]) > 0.5, 1, 0 )
  }))

# CONVERT TO DATA TABLE
dt <- data.table(dt)

# GENERATE A LINEAR MODEL AT EACH LEVEL GRP
glm_list <- lmList(y ~ x1 + x2 | grp,
                   family = binomial(link = "logit"),
                   data = dt)


# CREATE A SIMPLE TEST DATASET FROM MEANS AT EACH LEVEL OF X1 
# GET MEANS AT EACH LEVEL OF GRP x X2
grp_means <- dt[, .(mn = mean(y)), by = .(grp, x2)]
# REPLICATE AT X1
pred_data <- rbind(data.table(grp_means, x1 = -1),
                        data.table(grp_means, x1 = 0),
                        data.table(grp_means, x1 = 1))

# GENERATE PREDICTIONS FOR TEST DATA
pred_data[ , preds := inv.logit(predict(glm_list, as.data.frame(pred_data)))]

# EXAMINE PREDICTIONS AT LEVELS OF X1 HOLDING GRP AND X2 CONSTANT 
pred_data[c(50,100,150),]
# grp    x2  mn x1     preds
# 1:   e 2.075 0.6 -1 0.4689207
# 2:   e 2.075 0.6  0 0.3400327
# 3:   e 2.075 0.6  1 0.6517386

# TRY TO REPLICATE PREDICTED VALUES ON SAME SUBSET OF FULL DATA
inv.logit(predict(glm_list, pred_data[c(50,100,150),]))
# e         e         e 
# 0.4725348 0.5642330 0.6517386 

As shown only the third row of the subset replicated. Is there something I'm missing?

Thanks.

0

There are 0 best solutions below