R glmnet: segmentation fault when using multinomial and pmax

994 Views Asked by At

I use the glmnet package to run multinomial lasso regression. When using family="multinomial and a dataset with p variables and nsamples and pmax=x a segmentation fault occurs if x is odd (if not pmax>p. In this case it is most probably ignored because it has no influence). An example:

n=100
p=20
require(glmnet)
D= as.data.frame(replicate(p, rnorm(n)))
D[,p] = as.factor(round(rnorm(n)))

lasso  <- glmnet(data.matrix(D[, -p]), D[, p], standardize=T, family="multinomial")         ## works
lasso  <- glmnet(data.matrix(D[, -p]), D[, p], standardize=T, family="multinomial", pmax=7) ## works, because it is odd
lasso  <- glmnet(data.matrix(D[, -p]), D[, p], standardize=T, family="multinomial", pmax=24 ## works, because pmax>p
lasso  <- glmnet(data.matrix(D[, -p]), D[, p], standardize=T, family="multinomial", pmax=10)## crashes

and the Error message:

 *** caught segfault ***
address 0x22de58a8, cause 'memory not mapped'

Traceback:
 1: .Fortran("lognet", parm = alpha, nobs, nvars, nc, as.double(x),     y, offset, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh,     isd, intr, maxit, kopt, lmu = integer(1), a0 = double(nlam *         nc), ca = double(nx * nlam * nc), ia = integer(nx), nin = integer(nlam),     nulldev = double(1), dev = double(nlam), alm = double(nlam),     nlp = integer(1), jerr = integer(1), PACKAGE = "glmnet")
 2: lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,     nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd,     intr, vnames, maxit, kopt, family)
 3: glmnet(data.matrix(D[, -p]), D[, p], standardize = T, family = "multinomial",     pmax = 10)

my first question is: why? is there a mathematical reason for this? (I suppose so...)

the second one is: isn't there a better solution than a segmentation fault?...like a warning or so? Or just using pmax<-pmax-1

EDIT: ok, it seems to be a bit more complicated. Sometimes the segmentation fault only occurs if i execute the very same command with the same number for pmax a second time.

Additionally i found this error

*** glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption (out): 0x0000000005c41720 ***
======= Backtrace: =========
....

for both, even and odd, numbers for pmax....

Now it looks more like a bug for me....or?

EDIT 2: I run R 2.15.2 with glmnet 1.9-5 in a linux environment (64 bit) I also get a segmentation fault on a different PC with ubuntu 64 bit and R 3.0.2

2

There are 2 best solutions below

1
On

Here's what I get, under R 3.0.2, 64k, Windows 7, glmnet1.9-5

lasso  <- glmnet(data.matrix(D[, -p]), D[, p], standardize=T, family="multinomial", pmax=10)
Warning message:
from glmnet Fortran code (error code -10005); Number of nonzero coefficients along the path exceeds pmax=10 at 5th lambda value; solutions for larger lambdas returned 

You didn't state your setup so I can't commment on why you don't capture the error, but this message should give considerable insight :-) into why large pmax is causing trouble.

EDIT: to clarify: I do not get a segfault.

1
On

I know this answer is too late, but just in case someone may find the same issue:

I had the same problem, and solved it following the steps in This Question. Basically, for any version of MATLAB you have got, try

mex -v -setup

which, when falling, would give a list of compilers to use. Install the more recent ones but not after the MATLAB version (for 2016a, I did XE2016 and VS2015). Then

mex glmnetMex.F glmnet.f

will do the job.

On a Linux machine (with 2014a), I tried

mex -largeArrayDims glmnetMex.F GLMnet.f

using the gcc compiler and it worked too.

Hope this might help. It has cost me weeks to figure this out.