I want to obtain pooled average predictions per level of a categorical variable using a multinomial regression on multiple imputed data. My problem is that the pool()
function seems to collapse all the levels to produce one single estimated average prediction instead of keeping the stratification per category. Here is a step-by-step example:
- Using
rent
data fromlibrary(catdata)
I create 2 categorical variables
library("tidyverse")
library("catdata")
library("mice")
library("nnet")
library("marginaleffects")
data(rent)
rent<-rent%>%
mutate(rent_cat=factor(case_when(
rent<389.95~"Low price",
rent>=389.95&rent<700.48~"Medium price",
rent>=700.48~"High price"
)))%>%
mutate(size_cat=factor(case_when(
size<53~"Small",
size>=53&size<83~"Medium",
size>=83~"Big"
)))
- Run a multiple imputation by chained equation
micerent<-mice(rent,m=3)
- Obtain average prediction per level of a categorical variable
To be noted, based on this post it is better to build a function to extract predicted probabilties and to apply it to our imputed datasets in a complete format. I'm using the avg_predictions
following suggestion from marignaleffects
package tutorial.
fit_reg <- function(dat) {
mod <- multinom(rent_cat~ size_cat,data=dat)
out <- avg_predictions(mod,type="probs",by="size_cat")
return(out)
}
micecompleterent<- complete(micerent, "all")
model<-lapply(micecompleterent, fit_reg)
- Pool the average predictions
There, instead of producing average prediction estimation per levels of size_cat
, it seems to collapse all the stratified averages to give one pooled average prediction.
summary(pool(model),conf.int=T)
I also tried using newdata=datagrid(size_cat=c("Big","Medium","Small")
in the predictions
function from marginaleffects
package but the result is the same:
fit_reg_response <- function(dat) {
multinom(rent_cat~ size_cat,data=dat)
out <- predictions(mod,newdata=datagrid(size_cat=c("Big","Medium","Small")))
return(out)
}
There are two problems:
mice::pool
calls thetidy()
function on all objects. It expects the output oftidy()
to be a data.frame with aterm
column, buttidy
does not return such a column in this case, because there is not a robust and general way to say what a “term” is in the context of average predictions.tidy()
returns a one-row average estimate when applied to the output ofavg_predictions()
There is not a super straightforward way to solve this, but here's a workaround:
fit_reg()
function.tidy()
S3 method which creates theterm
column you want, based on the row identifiers you care about.Old code
New code