set.seed(1)
library(data.table)
data=data.table(STUDENT = 1:1000,
OUTCOME = sample(20:90, r = T),
X1 = runif(1000),
X2 = runif(1000),
X3 = runif(1000))
data[, X1 := fifelse(X1 > .9, NA_real_, X1)]
data[, X2 := fifelse(X2 > .78 & X2 < .9, NA_real_, X1)]
data[, X3 := fifelse(X3 < .1, NA_real_, X1)]
Say you have data as shown and you wish to impute values for X1, X2, X3 and leave out STUDENT and OUTCOME for the imputation processing.
I can do
library(mice)
dataIMPUTE=mice(data[, c("X1", "X2", "X3")], m = 1)
but how do I get together the imputing values from dataIMPUTE with STUDENT and OUTCOME? I am afraid that I will merge wrong and that is why I ask if you have advice for this.
One possibility is to use the complete data set in the imputation, but change the
predictorMatrix
so thatSTUDENT
andOUTCOME
are not used in the imputation model.First, you need to run
mice
to extract thepredictorMatrix
(without calculating the imputation). Then you can set all columns to 0 that shouldn't be included in the imputation model. However, all your variables are still contained in yourdataIMPUTE
object: