I am having trouble figuring out how to pass a parallel cluster defined within one function for use in another function. I can't seem to set the environments so that the inner function can find the cluster and all the variables exported to the cluster.
The actual functions are quite long. Here is a toy example:
library(pacman)
p_load(dplyr, stats, parallel, doParallel)
testfunction <- function(y, cl){
val2 <- y^2
parallel::clusterExport(cl, list("val2"), envir=environment())
interim_list <- foreach(p = seq(1,2)) %:%
foreach(i = test_list[[p]], .combine = 'rbind') %dopar% {
out_val <- sum(i)+val+val2
out_val}
obj <- sum(unlist(interim_list))
return(obj)
}
call_function <- function(){
test_list <- list()
test_list[[1]] <- c(1)
test_list[[2]] <- c(3)
cl <- parallel::makeCluster(4)
doParallel::registerDoParallel(cl)
parallel::clusterExport(cl, list("test_list"), envir = environment())
test_vals <- seq(1,3,1)
for (val in test_vals){
parallel::clusterExport(cl, list("val"), envir = environment())
stats::optim(par = c(.15), testfunction, cl = cl, method = 'BFGS')
}
stopCluster(cl)
}
call_function()
As written, the code fails on the foreach line because it can't find test_list. I have also tried different envir arguments and passing the local environment of call_function() to test_function(). In this toy example, it would be easy to drop the parallel processing, pass test_list to test_function() as a function argument, or define the cluster within test_function(), but implementing similar solutions in the full code would slow it down too much.
Any suggestions would be greatly appreciated!