Using the import package with drake

102 Views Asked by At

Finding out about the drake package was one of the best recent discoveries as an R user. However, one drawback I see with the package in terms of reproducibility is the cluttering of the workspace with functions that are merely helper functions.

No one knows whether these sourced functions clash, or if the order of library calls matters. I know there is the conflictedpackage, but it only deals with packages. I know the code unit in R should be a package, but it seems strange to have an analysis with a handful of files like preprocessing.R, training.R and turn them into a package. Potential name clashes begin quite early anyway, and I've never seen anyone presenting a clean approach for R.

There is however the importpackage which allows for cherry picking the import of package functions and functions/variables from other files. Say you have function a in a.R, then importing it using import the function is accessible, but all of its dependencies are available to the function a but not imported, providing useful isolation.

I tested using the import package with drake, but drake does not detect if the dependencies of imported functions change, breaking it's actual use case. Does anyone know a way to tell drake to "drill down" on these functions, or any other way to make it work? Thanks in advance!

1

There are 1 best solutions below

6
On

By design, drake only tracks the functions in the environment of make(), which you can set with the envir argument (plus namespaced functions called with pkg::fun(), but it was a mistake to build that capability). envir is just the calling environment by default (parent.frame()). So when you use import::from(), be sure to set .into equal to "" so it brings stuff into drake's environment.

ls()
#> character(0)
import::from(dplyr, mutate, .into = "")
ls()
#> [1] "mutate"
library(drake)
plan <- drake_plan(x = mutate(mtcars, x = 1))
vis_drake_graph(plan)

Created on 2020-09-05 by the reprex package (v0.3.0)

Incidentally, you just handed us an excellent alternative to envir = getNamespace("yourPackage") from https://github.com/ropensci/drake/issues/1286#issuecomment-649088321, the latter of which is limited if you want to pull functions from multiple sources. So thanks! Let's spread the word about this workaround.