I have a dataframe called "prediction_set" which contains y and all possible predictors. From this dataframe, I want to generate the y vector and the X matrix. I've tried the following code but unfortunately, it only generates an empty matrix (although it displays the column names). How can I solve this?
#store dataframe
prediction_set <- subset(df_clean, is.na(df_clean$lnpercapitaconsumption))
#create X matrix and y vector for prediction set
X_prediction_set <- model.matrix(lnpercapitaconsumption ~ ., prediction_set)
y_prediction_set <- prediction_set$lnpercapitaconsumption
A sample of my dataframe can be found below:
> dput(prediction_set[1:20, c(1, 74)])
structure(list(lnpercapitaconsumption = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), h_hhsize = c(1L, 3L,
4L, 9L, 8L, 3L, 6L, 5L, 4L, 1L, 5L, 1L, 4L, 1L, 2L, 3L, 4L, 6L,
5L, 4L)), row.names = c(NA, 20L), class = "data.frame")
All your values are NA's, because you select a subset of
df_clean
with only these rows wherelnpercapitaconsumption
is NA). If you do the following, for example:(fill the variable with noise), you will see that the model matrix works as expected. Maybe you meant
!is.na()
rather thanis.na()
?Or maybe you want to make predictions based on a training set model? In that case, you don't need the Y or a model.