running multiple parallel processes in parallel R

1.4k Views Asked by At

I run Bayesian statistical models with each chain on a separate processing node using the runjags package in R. I want to fit multiple models at onceby nesting run.jags calls in a parallel loop using the foreach package. However, this often results in error messages, likely because the foreach loop doesn't "know" that within the loop I am calling other parallel processes, and so cores are probably double-allocated (or something). Here is an example error message:

Error in { : task 2 failed - "The following error was encountered while attempting to run the JAGS model:
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : cannot open the connection

Here is some example code to generate data and fit two models, that request 2 cores each (requiring a total of 4 cores, which I have on my laptop). I would love to find a solution that would allow me to run multiple parallel JAGS models, in parallel. In reality I am running 5-10 models at a time which each require 3 cores, on a cluster.

library(foreach)
library(runjags)

#generate a random variable, mean of 25, sd = 5.----
y.list <- list()
for(i in 1:2){
  y.list[[i]] <- rnorm(100, 25, sd = 5)
}

#Specify a JAGS model to fit an intercept.----
jags.model = "
model{
  for(i in 1:N){
    y.hat[i] <- intercept
    y[i] ~ dnorm(y.hat[i], tau)
  }

  #specify priors.
  intercept ~ dnorm(0,1E-3)
  tau <- pow(sigma, -2)
  sigma ~ dunif(0, 100)
}
"

n.cores <- 4
registerDoParallel(n.cores)

#Fit models in parallel, with chains running in parallel.----
#two processes that each require two cores (4 cores are registered and all that is required.)
output <- list()
output <- 
foreach(i = 1:length(y.list)) %dopar% {
    #specify data object.
    jd <- list(y=y.list[[i]], N = length(y.list[[i]]))
    #fit model.
    jags.out <- run.jags(jags.model,
                         data=jd,
                         n.chains=2,
                         monitor=c('intercept','tau'),
                         method='rjparallel')
    #return output
    return(jags.out)
}
2

There are 2 best solutions below

2
On

I am unable to run your sample, but the following vignette should help you out.

You may want to try to use the foreach nesting operator %:%

https://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf

foreach(i = 1:length(y.list)) %:% {
    #specify data object.
    jd <- list(y=y.list[[i]], N = length(y.list[[i]]))
    #fit model.
    jags.out <- run.jags(jags.model,
                         data=jd,
                         n.chains=2,
                         monitor=c('intercept','tau'),
                         method='rjparallel')
    #return output
    return(jags.out)
}
0
On

There are two things to consider here- how to nest parallel foreach() loops in general, and how to solve this particular issue.

The solution to nesting parallel foreach() loops comes from @Carlos Santillan's answer below, and is a based on a vignette that can be found here. Lets say we have one inner loop nested within an outer loop, similar to the problem above, however instead of the parallel call to run.jags we have a parallel foreach() call:

outer_list <- list()
#begin outer loop:
outer_list <-
foreach(i = 1:length(some_index)) %:% {
    #grab something to feed next foreach loop.
    to_inner <- grab_data[[i]]

    #Do something in a nested foreach loop.
    inner_list <- list()
    #begin inner loop:
    inner_list <-
    foreach(k = 1:some_index) %dopar% {
      #do some other function.
      out_inner <- some_function(to_inner)
      return(out_inner)
      }
   out_outer <- some_function(out_inner)
   return(out_outer)
}

The key is using the %:% operator in the outer loop, and the %dopar% operator in the inner loop.

This will not solve the above run.jags() nested parallel problem however, since it is not a nested foreach() loop. To solve this particular nested run.jags() problem I changed the method setting in run.jags to method=parallel instead of method=rjparallel. run.jags() has multiple different parallel implementations and this particular one seems to work based on my timing analyses. Hopefully in the future there will be a more definitive answer as to why this works. I just know that it does work.