I'm using the targets
pipelining system in R and am wondering how to statically branch optimally. I have a set of parameters for which I'd like to compute results for most but not all interactions. Notice how N_source_components
and N_target_components
aren't used by the agg_neighbourhoods
target, but they are used by other targets that I didn't include in this example. With the current setup, agg_neighbourhoods
will be run too many times (targets doesn't understand that not all columns in the value
argument of tar_map
are relevant for all targets, right?). Is there a smarter way?
I already tried nesting another tar_map
call within the currently shown one, to which N_source_components
and N_target_components
get relegated. This fixes the redundant executions of agg_neighbourhoods
, but doesn't allow me to filter undesirable combinations like I'm doing now because the value of query
isn't known at 'compilation' time.
Many thanks :)
tar_map(
values = tidyr::expand_grid(
query = c('6369', '6489', '6493'),
k = c(10, 30, 50),
d = c(5, 10, 15),
genelist = c(
'informativeV15',
'informativeV15_monotonic',
'informativeV15_monoreporter'
),
N_source_components = 10L,
N_target_components = as.integer(c(3, 5))
) %>%
dplyr::filter(
!(query %in% c('6369') & N_target_components > 3)) %>%
{ . },
tar_target(agg_neighbourhoods, {
f(
so = tar_read(so_target, branch = e2i(query))[[1]],
genelist = genelist,
k = k,
d = d
)
}, iteration = 'list')
)
Hopefully this is helpful to someone: in simpler terms, my problem was that targets were needlessly being run due to my necessity for filtering out some parameter combinations of target instantiations and not all parameters being used by all targets. A more simple and complete example of this scenario would be:
tarX
is being run for each value ofB
whereas only one evaluation is required. However, since the values of bothA
andB
are informative as to what combinations aren't required, we have to pre-specify the required targets.Seeing the 'problem' in this much cleaner abstracted representation, a solution becomes obvious more easily: just do two calls to
tar_map
, each operating on tailor-selected columns of the parameter grid.Perhaps there are other solutions as well. I'd be happy to hear them.