Here is a minimal example showing the issue:
mod.r
:
#' @export
run_sqrt <- function (x) {
sqrt(x)
}
mwe.r
box::use(
./mod[...],
parallel,
dp = doParallel,
foreach[foreach, `%dopar%`],
)
cl <- parallel$makeCluster(2L)
dp$registerDoParallel(cl)
foreach(i = 1 : 5) %dopar% {
run_sqrt(i)
}
parallel$stopCluster(cl)
This raises the error
Error in { : task 1 failed - "could not find function "run_sqrt""
I found this
parallel::clusterExport(cluster, setdiff(ls(), "cluster"))
in How to use `foreach` and `%dopar%` with an `R6` class in R?
But it didn't work
As you found this is a limitation of the ‘parallel’ package. It only knows about names defined in the current environment.
There are several solutions for this. The following list is roughly in order of (my personal) preference, from most preferred to least preferred.
Use explicitly qualified module access instead of attaching. So:
Change
./mod[...]
to./mod
insidebox::use()
Fully qualify the name inside
foreach
:Due to how
parallel
searches names, this will only work if the above code is executed in the global environment.Import
./mod
inside theforeach
body instead of at the beginning of your script. However, note that there is currently an open bug regarding this solution.Use
parallel::clusterExport
; this solution works if the correct names are provided, in this caserun_sqrt
. To make the minimal example work, add the following line before theforeach
call:The reason why your version didn’t work is because
ls()
won’t listrun_sqrt
, since the name is attached, it does not exist in the local scope. The same issue would exist with attached packages instead of modules. Furthermore, for reasons I do not understand,clusterExport
by default searches names in the global environment only, you need to explicitly provide the current environment, viaenvir = environment()
.