memory not being released in future with nested plan(multisession)

722 Views Asked by At

Recently I've been playing with doing some parallel processing in R using future (and future.apply and furrr) which has been great mostly, but I've stumbled onto something that I can't explain. It's possible that this is a bug somewhere, but it may also be sloppy coding on my part. If anyone can explain this behavior it would be much appreciated.

The setup

I'm running simulations on different subgroups of my data. For each group, I want to run the simulation n times and then calculate some summary stats on the results. Here is some example code to reproduce my basic setup and demonstrate the issue I'm seeing:

library(tidyverse)
library(future)
library(future.apply)

# Helper functions

#' Calls out to `free` to get total system memory used
sys_used <- function() {
  .f <- system2("free", "-b", stdout = TRUE)
  as.numeric(unlist(strsplit(.f[2], " +"))[3])
}

#' Write time, and memory usage to log file in CSV format
#' @param .f the file to write to 
#' @param .id identifier for the row to be written
mem_string <- function(.f, .id) {
  .s <- paste(.id, Sys.time(), sys_used(), Sys.getpid(), sep = ",")
  write_lines(.s, .f, append = TRUE)
}

# Inputs
fake_inputs <- 1:16
nsim <- 100
nrows <- 1e6

log_file <- "future_mem_leak_log.csv"
if (fs::file_exists(log_file)) fs::file_delete(log_file)

test_cases <- list(
  list(
    name = "multisession-sequential",
    plan = list(multisession, sequential)
  ),
  list(
    name = "sequential-multisession",
    plan = list(sequential, multisession)
  )
)

# Test code

for (.t in test_cases) {
  plan(.t$plan)
  
  # loop over subsets of the data 
  final_out <- future_lapply(fake_inputs, function(.i) {
    # loop over simulations
    out <- future_lapply(1:nsim, function(.j) {
      # in real life this would be doing simulations, 
      # but here we just create "results" using rnorm()
      res <- data.frame(
        id = rep(.j, nrows),
        col1 = rnorm(nrows) * .i,
        col2 = rnorm(nrows) * .i,
        col3 = rnorm(nrows) * .i,
        col4 = rnorm(nrows) * .i,
        col5 = rnorm(nrows) * .i,
        col6 = rnorm(nrows) * .i
      )

      # write memory usage to file
      mem_string(log_file, .t$name)

      # in real life I would write res to file to read in later, but here we
      # only return head of df so we know the returned value isn't filling up memory
      res %>% slice_head(n = 10) 
    })
  })

  # clean up any leftover objects before testing the next plan
  try(rm(final_out))
  try(rm(out))
  try(rm(res))
}

The outer loop is for testing two parallelization strategies: whether to parallelize over the subsets of data or over the 100 simulations.

Some caveats

  • I realize that parallelizing over the simulations is not the ideal design, and also that chunking that data to send 10-20 simulations to each core would be more efficient, but that's not really the point here. I'm just trying to understand what is happening in memory.
  • I also considered that maybe plan(multicore) would be better here (though I'm sure if it would) but I'm more interested in figuring out what's happening with plan(multisession)

The results

I ran this on an 8-vCPU Linux EC2 (I can give more specs if people need them) and created the following plot from the results (plotting code at the bottom for reproducibility):

memory profile plot

First off, plan(list(multisession, sequential)) is faster (as expected, see caveat above), but what I'm confused about is the memory profile. The total system memory usage remains pretty constant for plan(list(multisession, sequential)) which I would expect, because I assumed the res object is overwritten each time through the loop.

However, the memory usage for plan(list(sequential, multisession)) steadily grows as the program runs. It appears that each time through the loop the res object is created and then hangs around in limbo somewhere, taking up memory. In my real example this got large enough that it filled my entire (32GB) system memory and killed the process about halfway through.

Plot twist: it only happens when nested

And here's the part that really has me confused! When I changed the outer future_lapply to just regular lapply and set plan(multisession) I don't see it! From my reading of this "Future: Topologies" vignette this should be the same as plan(list(sequential, multisession)) but the plot doesn't show the memory growing at all (in fact, it's almost identical to plan(list(multisession, sequential)) in the above plot)

one level plot

Note on other options

I actually originally found this with furrr::future_map_dfr() but to be sure it wasn't a bug in furrr, I tried it with future.apply::future_lapply() and got the results shown. I tried to code this up with just future::future() and got very different results, but quite possibly because what I coded up wasn't actually equivalent. I don't have much experience with using futures directly without the abstraction layer provided by either furrr or future.apply.

Again, any insight on this is much appreciated.

Plotting code

library(tidyverse)

logDat <- read_csv("future_mem_leak_log.csv",
                   col_names = c("plan", "time", "sys_used", "pid")) %>%
  group_by(plan) %>% 
  mutate(
    start = min(time), 
    time_elapsed = as.numeric(difftime(time, start, units = "secs"))
  )

ggplot(logDat, aes(x = time_elapsed/60, y = sys_used/1e9, group = plan, colour = plan)) + 
  geom_line() + 
  xlab("Time elapsed (in mins)") + ylab("Memory used (in GB)") +
  ggtitle("Memory Usage\n  list(multisession, sequential) vs list(sequential, multisession)") 

0

There are 0 best solutions below