Conditional static branching in R targets

533 Views Asked by At

I'm using the targets pipelining system in R and am wondering how to statically branch optimally. I have a set of parameters for which I'd like to compute results for most but not all interactions. Notice how N_source_components and N_target_components aren't used by the agg_neighbourhoods target, but they are used by other targets that I didn't include in this example. With the current setup, agg_neighbourhoods will be run too many times (targets doesn't understand that not all columns in the value argument of tar_map are relevant for all targets, right?). Is there a smarter way?

I already tried nesting another tar_map call within the currently shown one, to which N_source_components and N_target_components get relegated. This fixes the redundant executions of agg_neighbourhoods, but doesn't allow me to filter undesirable combinations like I'm doing now because the value of query isn't known at 'compilation' time.

Many thanks :)

tar_map(
  values = tidyr::expand_grid(
    query = c('6369', '6489', '6493'),
    k = c(10, 30, 50),
    d = c(5, 10, 15),
    genelist = c(
      'informativeV15',
      'informativeV15_monotonic',
      'informativeV15_monoreporter'
    ),
    N_source_components = 10L,
    N_target_components = as.integer(c(3, 5))
  ) %>%
  dplyr::filter(
    !(query %in% c('6369') & N_target_components > 3)) %>%
  { . },

  tar_target(agg_neighbourhoods, {
    f(
      so = tar_read(so_target, branch = e2i(query))[[1]],
      genelist = genelist,
      k = k,
      d = d
    )
  }, iteration = 'list')
)
1

There are 1 best solutions below

0
On

Hopefully this is helpful to someone: in simpler terms, my problem was that targets were needlessly being run due to my necessity for filtering out some parameter combinations of target instantiations and not all parameters being used by all targets. A more simple and complete example of this scenario would be:

tar_map(
  values = tibble(A = 1:2, B = 1:4) %>%
    dplyr::filter(!(A == 2 & B > 2)),
  
  tar_target(tarX, A*3),

  tar_target(tarY, A*4 + B^2)
)

tarX is being run for each value of B whereas only one evaluation is required. However, since the values of both A and B are informative as to what combinations aren't required, we have to pre-specify the required targets.

Seeing the 'problem' in this much cleaner abstracted representation, a solution becomes obvious more easily: just do two calls to tar_map, each operating on tailor-selected columns of the parameter grid.

param_grid <-
  tibble(A = 1:2, B = 1:3) %>%
  dplyr::filter(!(A == 2 & B > 2))

list(
  tar_map(
    values = param_grid %>%
      dplyr::select(-B) %>%
      dplyr::distinct(),

    tar_target(tarX, A*3)
  ),

  tar_map(
    values = param_grid,

    tar_target(tarY, A*4 + B^2)
  )
)

Perhaps there are other solutions as well. I'd be happy to hear them.