Using tidy evaluations with furrr

538 Views Asked by At

I want to make the following function run in parallel using the furrr package instead of the purrr package.

library(furrr)
library(tidyverse)

input <- list(element1 = tibble::tibble(a = c(1, 2), b = c(2, 2)),
              element2 = tibble::tibble(a = c(1, 2), b = c(4, 4))
)

multiplier <- function(data, var1, var2){
  purrr::map_df(.x = data,
                .f = ~ .x %>% 
                  dplyr::mutate(product = {{var1}} * {{var2}})
  )
}

multiplier(input, a, b)

However, when I just convert it to the furrr equivalent I get an error.

multiplier_parallel <- function(data, var1, var2){
  furrr::future_map_dfr(.x = data,
                .f = ~ .x %>% 
                  dplyr::mutate(product = {{var1}} * {{var2}})
  )
}

future::plan(multiprocess)

multiplier_parallel(input, a, b)
Error in get(name, envir = env, inherits = FALSE) : 
Identified global objects via static code inspection (structure(function (..., .x = ..1, .y = ..2, . = 
..1); .x %>% dplyr::mutate(product = {; {; var1; }; } * {; {; var2; }; }), class = 
c("rlang_lambda_function", "function"))). Object 'a' not found 

I assume the reason is that the future package looks for all necessary variables to be exported to the workers. In this case it is looking for the column name "a" as a global variable but cannot find it hence the error.

When I just insert the variable names into the call it works, however now the function does not work with any variable names anymore:

multiplier_parallel <- function(data, var1, var2){
  furrr::future_map_dfr(.x = data,
                .f = ~ .x %>% 
                  dplyr::mutate(product = a * b)
  )
}

multiplier_parallel(input, a, b)

I tried several things so far including providing the names to .future_options, but none seem to work. Is there any way to make this work? My actual function is quite a bit more complex but I assume the principal is the same. Would be great if someone could help!

3

There are 3 best solutions below

8
On BEST ANSWER

future tries to automatically determine the global variables you use in your code. Because of the tidy evaluation, it identifies a and b but doesn't find it. You can disable this setting by using future_options(globals = FALSE).

future::plan(future::multiprocess)

input <- list(element1 = tibble::tibble(a = c(1, 2), b = c(2, 2)),
              element2 = tibble::tibble(a = c(1, 2), b = c(4, 4))
)

multiplier_parallel <- function(data, var1, var2){
      furrr::future_map_dfr(.x = data,
                            .f = ~ .x %>% 
                                  dplyr::mutate(product = {{var1}} * {{var2}}),
                            .options = furrr::future_options(globals = FALSE)
      )
}

multiplier_parallel(input, a, b)
# A tibble: 4 x 3
      a     b product
  <dbl> <dbl>   <dbl>
1     1     2       2
2     2     2       4
3     1     4       4
4     2     4       8
0
On

At the lowest level, this seems to be a bug with the globals package, which furrr uses to find global variables that need to be exported to the workers. I have reported this bug upstream at https://github.com/HenrikBengtsson/globals/issues/65

The issue is related to NSE (non standard evaluation) and where globals "looks" to find the global variables and can be reproduced with just globals and base R. With globals 0.13.0, I get the following:

library(globals)

fn <- function(expr) {
  expr <- substitute(expr)
  eval(expr, envir = mtcars)
}

fn(cyl)
#>  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

expr <- quote(fn(cyl))

globalsOf(expr)
#> Error in globalsByName(names, envir = envir, mustExist = mustExist): Identified global objects via static code inspection (fn(cyl)). Failed to locate global object in the relevant environments: 'cyl'

The error message is a bit different, but I am fairly certain it is the same underlying issue.

It is curious that no error occurs when the column is hardcoded, but we still delay evaluation. i.e. this works, but the result is rather long so I won't show the output:

library(globals)

fn <- function() {
  expr <- quote(cyl)
  eval(expr, envir = mtcars)
}

fn()

expr <- quote(fn())

globalsOf(expr)
1
On

Ah, in my previous answer I forgot that this was one of the "Common gotchas" of furrr. The previous answer is not necessarily incorrect, and provides some extra insight, so I'll leave it. See this post for more information https://davisvaughan.github.io/furrr/articles/articles/gotchas.html#non-standard-evaluation-of-arguments

Unlike purrr, with furrr each argument has to be evaluated once ahead of time to be able to ship it off to the workers. This means that there are some differences with arguments that use NSE. You can actually work around this one by defusing the arguments first with enquo(), and then forcing their evaluation in the furrr function with !!. Defusing them ahead of time turns var1 and var2 into objects that can be shipped off to the workers.

input <- list(
  element1 = tibble::tibble(a = c(1, 2), b = c(2, 2)),
  element2 = tibble::tibble(a = c(1, 2), b = c(4, 4))
)

multiplier_parallel <- function(data, var1, var2) {
  var1 <- rlang::enquo(var1)
  var2 <- rlang::enquo(var2)
  
  furrr::future_map_dfr(
    .x = data,
    .f = ~dplyr::mutate(.x, product = !!var1 * !!var2)
  )
}

future::plan(future::multisession, workers = 2)

multiplier_parallel(input, a, b)
#> # A tibble: 4 x 3
#>       a     b product
#>   <dbl> <dbl>   <dbl>
#> 1     1     2       2
#> 2     2     2       4
#> 3     1     4       4
#> 4     2     4       8

Note that we typically encourage the {{ }} embracing pattern over !!enquo(), but in some very rare cases like this one the separation of defusing/forcing is required.