Poor speed gain in using `future` for parallelization

811 Views Asked by At

I find that the speed gain in using the future (and furrr) package for parallelization in R is not satisfactory. In particular, the speed improvement is not close to linear. My machine has 4 workers, so I thought the speed gain should be around linear when the number of workers I specify is not larger than the number of cores available in my machine. However, it is not the case.

The following is an example that illustrates the problem, where I draw 10^7 random numbers for 500 times.

library(future)
library(furrr)

# Parameters
n <- 1e7
m <- 500

# Compute the mean
rmean <- function(x, n) {
  rand.vec <- runif(n)
  rand.mean <- mean(rand.vec)
  return(rand.mean)
}

# Record the time used to compute the mean of n numbers for m times
rtime <- function(m, n) {
  t1 <- Sys.time()
  temp <- future_map(.x = 1:m,
                     .f = rmean,
                     n = n,
                     .options = furrr::furrr_options(seed = TRUE))
  t2 <- Sys.time()
  # Print the time used
  print(t2 - t1)
  return(temp)
}

# Print the time used for different number of workers 
plan(multisession, workers = 1)
set.seed(1)
x <- rtime(m, n)
# Time difference of 2.503885 mins

plan(multisession, workers = 2)
set.seed(1)
x <- rtime(m, n)
# Time difference of 1.341357 mins

plan(multisession, workers = 3)
set.seed(1)
x <- rtime(m, n)
# Time difference of 57.25641 secs

plan(multisession, workers = 4)
set.seed(1)
x <- rtime(m, n)
# Time difference of 47.31929 secs

In the above example, the speed gain that I get are:

  • 1.87x for 2 workers
  • 2.62x for 3 workers
  • 3.17x for 4 workers

The speed gain in the above example is not close to linear, especially when I use 4 workers. I thought this might be because of the overhead time from the plan function. However, the speed gain is similar if I run the procedure multiple times after setting the number of workers. This is illustrated as follows:

plan(multisession, workers = 3)
set.seed(1)
x <- rtime(m, n)
# Time difference of 58.07243 secs
set.seed(1)
x <- rtime(m, n)
# Time difference of 1.012799 mins
set.seed(1)
x <- rtime(m, n)
# Time difference of 57.96777 secs

I also tried to use the future_lapply function from the future.apply package instead of the future_map function from the furrr package. However, their speed gain is similar as well. Therefore, I would appreciate any advice on what is going on here. Thank you!

0

There are 0 best solutions below