The arrangement on how I need to run my scripts is first to run the 4 R scripts in parallel using the rstudioapi::jobRunScript()
function. Each of the scripts that is running in parallel does not import anything from any environment but instead exports the data frames created to the global environment. My 5th R script builds on the data frames created by the 4 R scripts that run in parallel and also this 5th script is running in the console. If there's a way to run the 5th script in the background rather than in the console after the first 4 R scripts are done running in parallel, that would be a lot better. I'm also trying to reduce the total running time of the whole process.
Although I was able to figure out how to run the first 4 R scripts in parallel, my task isn't completely done because I can't find a way on how to trigger the running of my 5th R script. Hope y'all can help me here
This is a bit too open for my liking. While
rstudioapi
definitely can be used for running parallel tasks, it is not very versatile and does not give you very useful output. Theparallel
universe is well implemented in R with several packages that provide a much simpler and better interface for doing this. Here are 3 options, which also allow for the something to be 'output' from the different files.package = parallel
With the parallel package we can achieve this very simply. Simply creating a vector of files to be sourced and executing
source
in each thread. The main process will lock while they are running, but if you have to wait for them to finish anyway, this doesn't really matter much.As a side note, several packages have a similar interface to the
parallel
package, which was build upon thesnow
package, so it is a good baseline to have knowledge of.package = foreach
An alternative to the
parallel
package is theforeach
package, which gives something similar to afor-loop
interface, simplifying the interface while giving a more flexibility and automatically importing necessary libraries and variables (although it is safer to do this manually).The
foreach
package does depend on theparallel
anddoParallel
packages to set up a cluster howeverWhile it does add a few lines of code, the
.combine
,.packages
and.export
makes for a very simple interface to work with parallel computing in R.package = future
Now this is one of the more rare packages to be used.
future
provides a parallel interface that is more flexible than bothparallel
andforeach
allowing for asynchronous parallel programming. The implementation can however seem a bit more daunting, while the example I provide below is only scratching the surface of what is possible.Also worth mentioning is that while the
future
package does provide automatic import of functions and packages necessary to run code, experience has made me aware that this is limited only to the first level of depth in any call (sometimes less), as such exporting is still necessary.While
foreach
depends onparallel
(or similar) to start a cluster,foreach
will start one itself using all the available cores. A simple call toplan(multiprocess)
will start a multi core session.Now this might seem quite heavy at firsts, but the general mechanism is:
plan(multiprocess)
future
(or%<-%
, which I wont go into)resolve
, which works on a single future or multiple futures in a list (or environment)value
for single futures orvalues
for multiple futures in a list (or environment)future
environment by usingplan(sequential)
I believe these 3 packages provide interfaces to every necessary element of multiprocessing (at least on CPU) that any user needs to interface with. Other packages provide alternative interfaces while for asynchronous I am only aware of
future
andpromises
. In general I'd advice most users to be very careful when moving into asynchronous programming, as this can cause a whole suite of problems that are less frequent compares to synchronous parallel programming.I hope this may help provide an alternative to the (very limiting)
rstudioapi
interface, which I am fairly certain was never meant to be used for parallel programming by the users themselves, but more likely intended to perform tasks such as building a package in parallel by the interface itself.