I just noticed when launching multiple sessions, each is ~100MB.
This gets even worse when launching in shiny, size jumps to 200BM.
I tried to limit the memory by removing globals and packages
packages_to_load <- c("paws", "jsonlite")
plan(multisession, workers = 10)
results_calc_rds = future_map(.x = tokens,
.f = my_fun,
.options = furrr::furrr_options(seed = NULL,
globals = FALSE,
packages = packages_to_load))
But it doesn't seem to have an impact.
Has anyone had an idea how to make these sessions as tiny as possible?
All I need is the packages paws and jsonline to do some AWS invokes.
Thank you!

Instead of specifying packages to load in the
future_map, you could try and load them before planning your workers.And Use explicit namespacing for functions to avoid loading entire packages, e.g.,
jsonlite::fromJSONinstead of loadingjsonlite.Make sure only necessary global variables and necessary date are being sent to the workers. Avoid passing large datasets if not required.
By pre-loading the packages and using explicit namespacing, you should avoid the overhead of loading packages for each worker.
But R's memory management may not always immediately reflect the memory savings, as garbage collection is not instantaneous.
However, as noted in "Advanced R / Memory usage and garbage collection" by Hadley Wickham, you do not need to call
gc()occasionally to prompt R to clean up unused memory:If you need to load other packages and data within your scripts and are concerned about memory usage, you can consider lazy loading your packages, by using
requireNamespaceto check for the presence of a package without loading it into memory. Load it only when it is certain to be used.If you need to pass large datasets to your workers, consider compressing them before sending and decompressing within the worker function.
Also, convert data frames to more memory-efficient objects like
data.tableor use thefstpackage for fast serialization of data frames.And use the
futurepackage's ability to identify and transfer only the necessary parts of global variables to workers.If running on a server, use Docker containers to set a memory limit for each R session.
Your code would be:
By conditionally loading packages and data only when needed, you can minimize the memory footprint of each worker.