Parallel computing with R in a SLURM cluster

534 Views Asked by At

I need to do a model estimation using the MCMC method on a SLURM cluster (the system is CentOS). The estimation takes a very long time to finish.

Within each MCMC interaction, there is one step taking a particularly long time. Since this step is doing a lapply-loop (around 100000 loops, 30s to finish all loops), so as far as I understand, I should be able to use parallel computing to speed up.

I tried several packages (doMC, doParallel, doSNOW) together with the foreach framework. The setup is

  parallel_cores=8
  
  #doParallel
  library(doParallel) 
  cl<-makeCluster(parallel_cores)
  registerDoParallel(cl)
  
  #doMC
  library(doMC)
  registerDoMC(parallel_cores)
  
  
  #doSNOW, this is also fast
  library(doSNOW)
  ml<-makeCluster( parallel_cores)
  registerDoSNOW(cl)


  #foreach framework
  #data is a list
  data2=foreach(
        data_i=
          data,
        .packages=c("somePackage")
      ) %dopar% {
        data_i=some_operation(data_i)
        
        list(beta=data_i$beta,sigma=data_i$sigma)

      }

Using a doMC, the time for this step can be reduced to about 9s. However, as doMC is using shared memory and I have a large array to store estimate results, I quickly ran out of memory (i.e. slurmstepd: error: Exceeded job memory limit).

Using doParallel and doSNOW, the time for this step even increased, to about 120s, which sounds ridiculous. The mysterious thing is that when I tested the code in both my Mac and Windows machines, doParallel and doSNOW actually gave similar speed compared to doMC.

I'm stuck and not sure how to proceed. Any suggestions will be greatly appreciated!

0

There are 0 best solutions below