I am running a model using glmmTMB
where I have rank deficiency. I am now encountering warning messages related to the structure of the "newdata" when I try to generate predicted group means and standard errors. I have built a reproducible example here to show what I mean. I also receive similar warnings when trying to run the model in DHARMa. This problem for both predict and DHARMa seems to be with the structure of the "newdata". I am not sure how to fix this or how the rank deficiency control in glmmTMB is structured so that it is then affecting predict and DHARMa.
library(glmmTMB)
library(dplyr) ## for mutate_at, %>%
#Build example data
x <- c("A", "B", "C", "D")
(time <- rep(x, each=20, times=3)) #time factor
y <- c("exposed", "ref1", "ref2")
(lake <- rep (y, each=80)) #lake factor
set.seed(123)
min <- runif(n=240, min=4.5, max=5.5) #mins used in model offset
set.seed(123)
(count <- rnbinom(n=240,mu=10,size=100)) #randomly generated negative binomial data
#make data frame
dat <- as.data.frame(cbind(time, lake, min, count))
dat <- dat %>%
mutate_at(c('min', 'count'), as.numeric)
#remove one combination of factors to make example rank deficient (all observations from time A and lake ref1)
dat2 <- filter(dat, time!="A" | lake !="ref1")
Next create the model with rank deficiency. Predicting individual values is not a problem, but predicting group level means and se's throws an error
model <-glmmTMB(count~time*lake, family=nbinom1,
control = glmmTMBControl(rank_check = "adjust"),
offset=log(min), data=dat2)
#Predicting individual values is not a problem
predict(model, type = 'response', se.fit=FALSE) #This works
#Predicting group means. make a newdata data frame
pred_data <- data.frame(lake = c("exposed","exposed","exposed","exposed","ref1","ref1","ref1","ref2","ref2","ref2","ref2"))
pred_data$min <- c(1,1,1,1,1,1,1,1,1,1,1) #set it to something constant
pred_data$time <- c("A", "B", "C", "D", "B","C","D","A","B","C","D")
#predict means and se estimates
pred_data$pred_Freq<- predict(model, newdata = pred_data, type = 'response', se.fit=FALSE)
pred_data$pred_Freq_lwr<- pred_data$pred_Freq-predict(model, newdata = pred_data, type = 'response', se.fit=TRUE)$se.fit
pred_data$pred_Freq_upr<- pred_data$pred_Freq + predict(model, newdata = pred_data, type = 'response', se.fit=TRUE)$se.fit
I receive the following error. This missing factor combination here is not the one that is actually missing in the data. See above, it was time A and lake ref1 we removed.
Error in checkModelMatrix(getX(data.tmb1), getX(data.tmb0)) :
Prediction is not possible for unknown fixed effects: timeD:lakeref1
Probably some factor levels in 'newdata' require fitting a new model
As a side note: I get the same error when trying to run this rank deficient model in DHARMa. It also throws another error with performance, even though performance does work with glmmTMB. The error is not the same, but the model will run without the rank deficincy part just fine in performance.
res=DHARMa::simulateResiduals(model, n = 100)
Error in checkModelMatrix(getX(data.tmb1), getX(data.tmb0)) :
Prediction is not possible for unknown fixed effects: timeD:lakeref1
Probably some factor levels in 'newdata' require fitting a new model.
check_model(model)
Error: `check_model()` not implemented for models of class `glmmTMB` yet.