R's glmnet throwing "A and B inner dimensions must match", but they already do

1.9k Views Asked by At

I should say that although I'm learning glmnet for this problem, I've used the same dataset with other methods and it has worked fine.

In this process, I split my data into training and test sets, all formatted as matrices, and glmnet builds the model without complaining. However, when I try to run a prediction on the holdout set, it throws the following error:

glmfit <- glmnet(train_x_mat,train_y_mat, alpha=1)
glmpred <- predict(glmfit, s=glmfit$lambda.1se, new = test_x_mat)
# output:
Error in cbind2(1, newx) %*% nbeta : 
Cholmod error 'A and B inner dimensions must match' at file ../MatrixOps/cholmod_ssmult.c, line 82

However, I know that x_train and x_test have the same number of columns:

ncol(test_x)
[1] 146
ncol(train_x)
[1] 146

I'm fairly new to glmnet; is there something more I need to do to make it cooperate?

Edit:

Here are the dimensions. Apologies for posting the vectors originally. This may be more at the heart of it.

dim(train_x_mat)
[1] 1411  208
dim(test_x_mat)
[1] 352 204

Which is strange, because they are created this way:

train_x_mat <- sparse.model.matrix(~.-1, data = train_x, verbose = F)
test_x_mat <- sparse.model.matrix(~.-1, data = test_x, verbose = F)
1

There are 1 best solutions below

0
On BEST ANSWER

For anyone else who's running into this problem even though it seems like they shouldn't be, the issue is specifically with R's sparse.model.matrix. It will separate each level of a factor and give it its own column. Thus, if your dataset isn't particularly large, your training data and testing data could have different columns.

A solution, then, is to either add extra, blank columns to whichever matrix needs them, or else remove the columns that aren't shared by both. Of course, if you're building a model and expecting new data, the former is preferable. But anyway, the whole problem is a sign that your dataset is too small for the job.