I used MICE to impute data, and now I am trying to do growth curve modeling. I'm in the stage of assessing if there is a need for multlivevel modelling Here is my code
ICept <-gls(edeqGLOBAL_mean ~ 1, data=Imputed, method = "ML", na.action=na.exclude)
RICept <-lme(edeqGLOBAL_mean ~ 1, data=Imputed, random=~1|ID, method = "ML", na.actioin=na.exclude, control=c(optim="optim"))
and this is the error message I am getting
Error in as.data.frame.default(data, optional = TRUE) : cannot coerce class ‘"mids"’ to a data.frame
Any help as to what to do?
First of all, you need to understand what multiple imputation means: It creates several imputation for each missing value. Therefore, the
mids
object is essentially a list of data frames that have slightly different imputations for the missing values. The variance between these imputations represents your uncertainty about the missing data.Because
mids
is not just adata.frame
, you cannot use it the same way. Analyzing multiple imputation data involves two steps: First apply the analysis to each imputed data set. Second aggregate the results (i.e. model coefficients etc.) according to Rubin's rules such that you get an overall estimate as well as standard errors that include the variance between imputations.For several statistical functions (e.g.
ml
,glm
,anova
), themice
package provides an easy implementation of these two steps. A simple linear regression, for instance can be conducted on amids
object like this:Now, for
nlme::lme()
andgls()
these methods are not readily implemented. You will have to do a bit of programming instead. Specifically your code should involve the following:ICept_fun <- function(dat) summary(gls(edeqGLOBAL_mean ~ 1, method="ML", data=dat))$coefficients
a. Extract datasets from the
mids
object:Imputed_list <- lapply(1:Imputed$m, function(i) complete(Imputed, action=i))
b. Apply function:
ICept_list <- lapply(Imputed_list, ICept_fun)
pool.scalar
-function frommice
. It is meant for pooling any normally distributed (!) statistic that you calculated from a multiply imputed dataset. This also means that you may have to transform some of your statistics of interest before you can applypool.scalar
and back-transform them afterwards. For example, correlations should be transformed to Fisher's Z. Please see this useful vignette if you are not sure about your statistics of interest.The
pool.scalar()
function wants vectors of estimates (argumentQ
) and respective variances (argumentU
), so you need to reshape the list of results a bit. This may or may not look like this - depending on your functionICept_fun
:ICept_Qs <- lapply(ICept_list, function(x) x["(Intercept)", 1])
ICept_Us <- lapply(ICept_list, function(x) x["(Intercept)", 2]^2) #squared SE for variance estimate
If it turns out you need multilevel modelling, please be aware:
Multiple imputation of multilevel data brings its very own additional problems. In the imputation itself, you should take the multilevel structure of the data into account. If you just applied
mice::mice()
to your (long-format) dataset, this is not correct. One alternative method is to make a wide-format dataset, conduct multiple imputation, and then reshape the resulting imputed datasets back to long-format. In this case, back-shaping to long format would take place between the steps 2a and 2b that described above. As to whether this is the preferred method, I do not know. There are some good sources about this, for instance this vignette.