How to fix PSOCKcluster error when running GBM

168 Views Asked by At

I am trying to run a gbm

gbm(formula=loan_status~., data=mdTrnGBM, distribution = 'bernoulli', n.trees= 100, interaction.depth= 5, bag.fraction= 0.5, cv.folds= 5)

and keep getting this error:

Error in makePSOCKcluster(names = spec, ...) : 
  Cluster setup failed. 8 of 8 workers failed to connect. 

Any ideas on how to fix this? If I eliminate the bag fraction and cv folds it does tend to work, but I don't want to have to eliminate those.

1

There are 1 best solutions below

0
On

When you specify cv=5, gbm uses the parallel package to send each cross-validation job to a separate core. This might be a new problem with new R 4.0 and mac.. or whatever system you might be working with, see this link.

Right now you can try using only 1 core if your data isn't so huge, using an example dataset:

library(gbm)
fl = "https://raw.githubusercontent.com/hrishibawane/DataLit/master/credit_train.csv"
dat = read.csv(fl)
dat = dat[dat$Loan.Status !="",]
dat = droplevels(dat[complete.cases(dat),-c(1:2)])
dat$Loan.Status = as.numeric(dat$Loan.Status)-1

mdl = gbm(formula=Loan.Status~., data=dat, distribution = 'bernoulli', 
n.trees= 100, interaction.depth= 5, bag.fraction= 0.5, cv.folds= 5,n.cores=1)

gbm(formula = Loan.Status ~ ., distribution = "bernoulli", data = dat, 
    n.trees = 100, interaction.depth = 5, bag.fraction = 0.5, 
    cv.folds = 5, n.cores = 1)
A gradient boosted model with bernoulli loss function.
100 iterations were performed.
The best cross-validation iteration was 98.
There were 16 predictors of which 16 had non-zero influence.