Generate workflow plan for all combinations of inputs in Drake?

135 Views Asked by At

I'm trying to create a workflow plan that will run some function my_function(x, y) for all combination of inputs in my_dataset but am stuck as to how to to generate the commands for drake's workflow without using paste.



A <- 'apple'
B <- 'banana'
C <- 'carrot'

my_function <- function(x, y)
    paste(x, y, sep='|IT WORKS|')

my_function(A, B)

combos <- combn(c('A', 'B', 'C'), 2) %>% 
    t() %>% 

targets <- apply(combos, 1, paste, collapse = '_')

commands <- paste0('my_function(', apply(combos, 1, paste, collapse = ', '), ')') 

my_plan <- data_frame(target = targets, command = commands)


> my_plan
# A tibble: 3 x 2
  target command          
  <chr>  <chr>            
1 A_B    my_function(A, B)
2 A_C    my_function(A, C)
3 B_C    my_function(B, C)

The above code works, but I am using paste0 to generate the function call. I don't think this is optimal and it scales poorly. Is there a better way to generate these plans? This may be less of a drake question and more of an rlang question.


There are 2 best solutions below


DISCLAIMER: This answer shows how to compose expressions using rlang framework. However, drake expects commands as character strings, so the final expressions would need to be converted to strings.

We begin by capturing A, B and C as symbols using quote, then computing all possible pairwise combinations using the code you already have:

CB <- combn( list(quote(A), quote(B), quote(C)), 2 ) %>% 
    t() %>% as_data_frame()
# # A tibble: 3 x 2
#   V1       V2      
#   <list>   <list>  
# 1 <symbol> <symbol>
# 2 <symbol> <symbol>
# 3 <symbol> <symbol>

We can now use purrr::map2 to jointly traverse the two columns in parallel and compose our expressions:

CMDs <- purrr::map2( CB$V1, CB$V2, ~rlang::expr( my_function((!!.x), (!!.y)) ) )
# [[1]]
# my_function(A, B)

# [[2]]
# my_function(A, C)

# [[3]]
# my_function(B, C)

As mentioned above, drake expects character strings, so we have to convert our expressions to those:

commands <- purrr::map_chr( CMDs, rlang::quo_name )
# [1] "my_function(A, B)" "my_function(A, C)" "my_function(B, C)"

The rest of your code should work as before.

Ultimately, it's up to you to decide whether expression arithmetic or string arithmetic is more efficient / readable for your application. One additional thing to mention is the stringr package, which might make string arithmetic more pleasant to do.



drake now has a map_plan() function that does this.

Original post

Sorry I am late to this thread. A couple months ago, I added a section to the manual on custom metaprogramming in the manual to cover situations like the one you raised. In the example, there is one solution using rlang/tidyeval and an equivalent solution using to create function calls.

Now that I think of it, this use case is general enough that I think there should be a straightforward map_plan() function to build the plan for you. I will work on it.

By the way, the command column in your plan can be a list column of language objects rather than a character vector, but you need a character column to use wildcard templating.