Issues running models in parallel on supercomputer- do.call is not recognizing my model list in parLapply/clusterApply

219 Views Asked by At

I am trying to run a list of models in parallel by remotely connecting to a supercomputer to make use of many cores(via computeCanada) using the parallel package's parLapply.

When I run the line:

 modsout<-parLapply(cl=cl, X=mods, fun=run_um)

I get the following error:

 models3out<-parLapply(cl=cl, X=mods, fun=run_um)
 Error in do.call(c, clusterApply(cl = cl, x = splitList(X, nchunks), fun 
 = lapply,  : 
 second argument must be a list
 Calls: parLapply -> do.call
 Execution halted

I created my 'mods' list by doing the following for example:

mods<-list(mod1, mod2, mod3, mod4)

After getting the error, I checked str(mods) and it was returned as a "list of 4" so I really don't understand why it is not being recognized in my parLapply line.

Here is an exerpt of my code:

nodeslist = unlist(strsplit(Sys.getenv("NODESLIST"), split=" "))
cl<-makeCluster(nodeslist, type="PSOCK") #make cluster

#load all the data xxx

#create model list
mods<- list(b0<-list(formula='~1~1', data=bear3),
           b1<-list(formula='~trail+elev+precip+temp+hum+cat+tree+month+topo~1', data=bear3),
           b2<-list(formula='~trail~1', data=bear3),
           b3<-list(formula='~elev~1', data=bear3))

run_um<-function(x) {unmarked::occu(as.formula(x[[1]]),x[[2]])} #define the function

clusterExport(cl=cl, varlist=c("bear3", "run_um") #send data to cores

clusterEvalQ(cl=cl, library(unmarked))#load package on all cores

modsout<-parLapply(cl=cl, X=mods, fun=run_um)

My full job has >2000 models, and each model takes at least 20 minutes to run, plus I then have to run goodness of fit tests, hence why I am trying to use HPC. I am still relatively novice at R, and extremely new to HPC so any guidance would be extremely useful to me at this time! Thanks in advance

1

There are 1 best solutions below

1
Katia On

Before you apply parallelization to your code it is always a good idea to check if your code runs without parallelization for the first couple elements to your list. This will help you to catch bugs in your code.

Since I do not have your dataset, I will use a dataset frogs to prepare unmarkedFitOccu object:

library(unmarked)

# Create some unmarkedFitOccu object
data(frogs)
pferUMF <- unmarkedFrameOccu(pfer.bin)
siteCovs(pferUMF) <- data.frame(sitevar1 = rnorm(numSites(pferUMF)))
obsCovs(pferUMF) <- data.frame(obsvar1 = rnorm(numSites(pferUMF) * obsNum(pferUMF)))

The second step is to prepare a list (mods in your case). When you create a named list you should use "=" sign (not "<-" as you do in your code):

# Create a list
mods <- list(b1 = list(formula='~obsvar1~1', data=pferUMF),
             b2 = list(formula='~sitevar1~1', data=pferUMF))

We can then define the function:

run_um<-function(x) {unmarked::occu(as.formula(x[[1]]),data=x[[2]])  }

Then make sure you check if it works for a couple elements in your list:

# Check if the function works on a single list item
run_um(mods[[2]])
# Call:
#   unmarked::occu(formula = as.formula(x[[1]]), data = x[[2]])
# 
# Occupancy:
#   Estimate   SE     z P(>|z|)
# 8.81 29.7 0.297   0.767
# 
# Detection:
#   Estimate    SE       z  P(>|z|)
# (Intercept)   -1.916 0.164 -11.721 9.93e-32
# sitevar1      -0.139 0.178  -0.781 4.35e-01
# 
# AIC: 262.7176 

Then you can try a regular lapply function first for the first few elements:

lapply(mods,FUN=run_um)

And if everything goes well, you can then apply parallelization.