calling parallel::clusterExport and foreach from multiple functions (scoping)

39 Views Asked by At

I am having trouble figuring out how to pass a parallel cluster defined within one function for use in another function. I can't seem to set the environments so that the inner function can find the cluster and all the variables exported to the cluster.

The actual functions are quite long. Here is a toy example:

    library(pacman)
    p_load(dplyr, stats, parallel, doParallel)
    
    testfunction <- function(y, cl){
      val2 <- y^2
      parallel::clusterExport(cl, list("val2"), envir=environment())
      interim_list <- foreach(p = seq(1,2)) %:%  
        foreach(i = test_list[[p]], .combine = 'rbind') %dopar% {
          out_val <- sum(i)+val+val2
          out_val}
      obj <- sum(unlist(interim_list))
      return(obj)
    }
    
    call_function <- function(){
      test_list <- list()
      test_list[[1]] <- c(1)
      test_list[[2]] <- c(3)
      cl <- parallel::makeCluster(4)
      doParallel::registerDoParallel(cl)
      parallel::clusterExport(cl, list("test_list"), envir = environment())
      test_vals <- seq(1,3,1)
      for (val in test_vals){ 
        parallel::clusterExport(cl, list("val"), envir = environment())
        stats::optim(par = c(.15), testfunction, cl = cl, method = 'BFGS')
      }
      stopCluster(cl)
    }
    
    call_function()

As written, the code fails on the foreach line because it can't find test_list. I have also tried different envir arguments and passing the local environment of call_function() to test_function(). In this toy example, it would be easy to drop the parallel processing, pass test_list to test_function() as a function argument, or define the cluster within test_function(), but implementing similar solutions in the full code would slow it down too much.

Any suggestions would be greatly appreciated!

0

There are 0 best solutions below