I have several experiments that I'd like to run one experiment on each node and each experiment is a sequence of executions with several cores. Right now my code looks like:
run_seeds <- c(1,2,3,4,5,6,7,8,9,10)
write_lines(paste("problem", "run_seed", "num_patients", "method", "name", "type", "error", sep="\t"), file=err_file_name)
# initialize loop
for (j in 1:length(run_seeds)) {
...
}
# start loop
for (i in range_pat) {
print(paste("iteration",i))
for (j in 1:length(run_seeds)) {
run_seed<-run_seeds[[j]]
set.seed(run_seed)
...
write_lines(paste(problem, run_seed, i, "dst", ind_name, type, error, sep="\t"), file=err_file_name, append=TRUE)
}
}
Is this task suitable for rslurm? If so, how can I change the code? By looking at the example given https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html, I don't necessarily want to export and rds file or generate slurm script. I'd like to run it within one slurm script. Is it doable? Or do I need to change it to the format that's acceptable to rslurm? Also, there's a certain order of the result returned by the nodes. Is it still doable?
If not, what package would you recommend me using?
Slurm is still used with different defaults and has different best practices for R at various HPC centers, so it is worthwhile to learn some basics of using Slurm directly before attempting to manage it via an R script, like rslurm.
For your multinode experiments, you should write two scripts: a Slurm submission script (say
script.sh), and an MPI-enabled R script (sayscript.R).The submission script requests the nodes and sets your software environment (R version, BLAS libraries, MPI version, etc.), which will differ across HPC centers and you will need to consult your documentation on what is available.
module availon a login node will tell you what software environments are available.For example,
script.shcould look like this:Submit the script with
sbatch script.sh. This requests exclusive access to all cores on 4 nodes with a 2 minute job limit. Items in <...> should be self explanatory. The modules loaded may differ on your cluster and there may be other modules necessary, depending on how R is deployed. The last line runs one instance of yourscript.Rper node, using OpenMPI.script.Rwould use the package pbdMPI, which you can install from CRAN in an interactive R session on a login node (again, after appropriatemodule load randmodule load openmpispecific to your cluster).pbdMPI provides RNG reproducibility that is independent of the number of nodes and cores used. Under the covers, it provides functions to manage independent streams from the parallel package that align with your application rather than resources. Your
script.Rwould look like this:Several things to note here:
mclapply(). If you don't need all the cores on a node, you can run more than one instance of this code per node by specifying it in the--map-byOpenMPI parameter.my_streamsfromcomm.chunk().my_streamsis just a set of index values, which are used to set an independent stream of random numbers for each value. The streams continue where they left off with the nextivalue. It is also possible to reset them back to their start with eachiby parameterreset = TRUE.finalize()provides a graceful exit from MPI.