I should say that although I'm learning glmnet for this problem, I've used the same dataset with other methods and it has worked fine.
In this process, I split my data into training and test sets, all formatted as matrices, and glmnet builds the model without complaining. However, when I try to run a prediction on the holdout set, it throws the following error:
glmfit <- glmnet(train_x_mat,train_y_mat, alpha=1)
glmpred <- predict(glmfit, s=glmfit$lambda.1se, new = test_x_mat)
# output:
Error in cbind2(1, newx) %*% nbeta :
Cholmod error 'A and B inner dimensions must match' at file ../MatrixOps/cholmod_ssmult.c, line 82
However, I know that x_train
and x_test
have the same number of columns:
ncol(test_x)
[1] 146
ncol(train_x)
[1] 146
I'm fairly new to glmnet; is there something more I need to do to make it cooperate?
Edit:
Here are the dimensions. Apologies for posting the vectors originally. This may be more at the heart of it.
dim(train_x_mat)
[1] 1411 208
dim(test_x_mat)
[1] 352 204
Which is strange, because they are created this way:
train_x_mat <- sparse.model.matrix(~.-1, data = train_x, verbose = F)
test_x_mat <- sparse.model.matrix(~.-1, data = test_x, verbose = F)
For anyone else who's running into this problem even though it seems like they shouldn't be, the issue is specifically with R's
sparse.model.matrix
. It will separate each level of a factor and give it its own column. Thus, if your dataset isn't particularly large, your training data and testing data could have different columns.A solution, then, is to either add extra, blank columns to whichever matrix needs them, or else remove the columns that aren't shared by both. Of course, if you're building a model and expecting new data, the former is preferable. But anyway, the whole problem is a sign that your dataset is too small for the job.