Using MICE for Growth Curve Model

209 Views Asked by At

I used MICE to impute data, and now I am trying to do growth curve modeling. I'm in the stage of assessing if there is a need for multlivevel modelling Here is my code

ICept <-gls(edeqGLOBAL_mean ~ 1, data=Imputed, method = "ML", na.action=na.exclude)
RICept <-lme(edeqGLOBAL_mean ~ 1, data=Imputed, random=~1|ID, method = "ML", na.actioin=na.exclude, control=c(optim="optim"))

and this is the error message I am getting

Error in as.data.frame.default(data, optional = TRUE) : cannot coerce class ‘"mids"’ to a data.frame

Any help as to what to do?

1

There are 1 best solutions below

0
On

First of all, you need to understand what multiple imputation means: It creates several imputation for each missing value. Therefore, the mids object is essentially a list of data frames that have slightly different imputations for the missing values. The variance between these imputations represents your uncertainty about the missing data.

Because mids is not just a data.frame, you cannot use it the same way. Analyzing multiple imputation data involves two steps: First apply the analysis to each imputed data set. Second aggregate the results (i.e. model coefficients etc.) according to Rubin's rules such that you get an overall estimate as well as standard errors that include the variance between imputations.

For several statistical functions (e.g. ml, glm, anova), the mice package provides an easy implementation of these two steps. A simple linear regression, for instance can be conducted on a mids object like this:

lm1_mira <- with(mydata_mids, lm(y ~ x1 + x2)) #with.mids() creates a `mira` object
pool(lm1_mira)

Now, for nlme::lme() and gls() these methods are not readily implemented. You will have to do a bit of programming instead. Specifically your code should involve the following:

  1. Create a function that conducts your analysis and outputs the relevant coefficients/estimates and their standard errors. This may look something like this:
    ICept_fun <- function(dat) summary(gls(edeqGLOBAL_mean ~ 1, method="ML", data=dat))$coefficients
  2. Apply the function to each imputed dataset.
    a. Extract datasets from the mids object:
    Imputed_list <- lapply(1:Imputed$m, function(i) complete(Imputed, action=i))
    b. Apply function:
    ICept_list <- lapply(Imputed_list, ICept_fun)
  3. Pool the results using the pool.scalar-function from mice. It is meant for pooling any normally distributed (!) statistic that you calculated from a multiply imputed dataset. This also means that you may have to transform some of your statistics of interest before you can apply pool.scalar and back-transform them afterwards. For example, correlations should be transformed to Fisher's Z. Please see this useful vignette if you are not sure about your statistics of interest.
    The pool.scalar() function wants vectors of estimates (argument Q) and respective variances (argument U), so you need to reshape the list of results a bit. This may or may not look like this - depending on your function ICept_fun:
    ICept_Qs <- lapply(ICept_list, function(x) x["(Intercept)", 1])
    ICept_Us <- lapply(ICept_list, function(x) x["(Intercept)", 2]^2) #squared SE for variance estimate

If it turns out you need multilevel modelling, please be aware:
Multiple imputation of multilevel data brings its very own additional problems. In the imputation itself, you should take the multilevel structure of the data into account. If you just applied mice::mice() to your (long-format) dataset, this is not correct. One alternative method is to make a wide-format dataset, conduct multiple imputation, and then reshape the resulting imputed datasets back to long-format. In this case, back-shaping to long format would take place between the steps 2a and 2b that described above. As to whether this is the preferred method, I do not know. There are some good sources about this, for instance this vignette.