How to create a X matrix out of a dataframe in R?

60 Views Asked by At

I have a dataframe called "prediction_set" which contains y and all possible predictors. From this dataframe, I want to generate the y vector and the X matrix. I've tried the following code but unfortunately, it only generates an empty matrix (although it displays the column names). How can I solve this?

#store dataframe
prediction_set <- subset(df_clean, is.na(df_clean$lnpercapitaconsumption))

#create X matrix and y vector for prediction set

X_prediction_set <- model.matrix(lnpercapitaconsumption ~ ., prediction_set)
y_prediction_set <- prediction_set$lnpercapitaconsumption

A sample of my dataframe can be found below:

> dput(prediction_set[1:20, c(1, 74)])
structure(list(lnpercapitaconsumption = c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), h_hhsize = c(1L, 3L, 
4L, 9L, 8L, 3L, 6L, 5L, 4L, 1L, 5L, 1L, 4L, 1L, 2L, 3L, 4L, 6L, 
5L, 4L)), row.names = c(NA, 20L), class = "data.frame")
1

There are 1 best solutions below

2
On

All your values are NA's, because you select a subset of df_clean with only these rows where lnpercapitaconsumption is NA). If you do the following, for example:

prediction_set$lnpercapitaconsumption <- rnorm(nrow(prediction_set))

(fill the variable with noise), you will see that the model matrix works as expected. Maybe you meant !is.na() rather than is.na()?

Or maybe you want to make predictions based on a training set model? In that case, you don't need the Y or a model.