Trouble defining functions as outputs of other "factory functions" within an R package

58 Views Asked by At

Edit - Partial Possible Solution There is a clunky solution I wrote in a reply. I am not sure if this is the correct way to achieve something which I figure is rather common.


I have been writing functions which follow the same template.

Given a couple of inner functions some (not all) of their arguments, an "outer function" combines sums and products of the inner functions. The manner in which the outer function combines things is predetermined, but users should be free to set arguments of the inner functions. Since I cannot anticipate the functions, I cannot anticipate their arguments, so creating a single-use wrapper is not feasible.

In order to maintain and improve code, I thought it was sensible to create a "factory function" which can produce the desired outer functions. The factory function would help me create other useful functions for my package as well as allowing users to create their own functions of the same type. Following this, I created such a function. When I load the factory function in a regular R session, run it, and name its output, the resulting named function works exactly as I desire.

However, the very same assignment of the output of a factory function does not work when I try to do it inside a package using devtools::load_all().

I have done some troubleshooting and experimentation and have narrowed things down a bit. Some experiments and results are on a public github repo I made to learn more about my issue. This is my first encounter with formal concepts such as components of functions and environments, so my terminology and ideas might be off. What I believe to be a repex is at the bottom.

Basic troubleshooting - merely following old examples and instructions I do not think this is an issue with devtools or my particular package configuration situation. I have gone through some old threads and blog posts, including 1, 2 and 3.

Experimentation - things enclosed in the example repo

  • dgeom_shifted_manual() in geom_shifted.R: Making a wrapper function for a known function with known arguments presents no issue. This function is a hard-coded example showing some of the desired behaviour.
  • dgeom_shifted_programmatic() in geom_shifted.R: This function produces exactly the type of behaviour I desire. Given a function (later on, multiple functions) as well as transformations of some args, a factory function (shift_support()) outputs my desired outer function. In this case, dgeom_shifted_programmatic() is my outer function and it works as expected in an interactive R session. Specifically, if I comment out its creation and run load_all() and subsequently uncomment and run its creation, I get the result I want. If, on the other hand, I leave it uncommented and run load_all(), the error is:

Error in load_all(): Failed to load R/geom_shifted.R Caused by error in shift_support(): could not find function "shift_support" Run rlang::last_trace() to see where the error occurred. R apparently does not find the factory function in this case.

  • util_caller2() in geom_shifted.R: This example shows that if a utility function (even if not exported to namesspace) is wrapped within another function, it works as desired.

  • util_caller3 in geom_shifted.R: Directly setting an object to in geom_shifted.R to the output of a function defined in support_utils.R results in the same error "could not find function".

  • stats_util_caller1 in geom_shifted.R: (ditto)

  • stats_util_caller2 in geom_shifted.R: We have the same issue, even when the factory/utility function is in the very same file.

  • uses_stats_util_caller2 in geom_shifted.R: Shows that a dummy wrapper function runs but does not solve the problem in any meaningful way.

Conclusion Overall, this does not seem like a namespace issue. I think that the problem could be solved by altering the environment in which my factory function is called. I believe this is what I am implicitly doing when I wrap it in a dummy function, which is why I do not get the "not found" error. However, I do not yet know about environments and packages to get this working properly (in a single step with no finagling or creating a separate "utils" companion package).

I also have the feeling that even if this can be solved as I describe, this is probably not the canonical way of doing things. My goal sounds like it should be common in functional programming, but I have not found anything about it in the specific case of making a package. Zooming out a bit, is there a deeper design issue here? Is this not the right way of thinking about this problem?

Many thanks!

Minimal Repex based on the shared repo (might not get the the heart of the problem Create a new package. In a source file paste:

uses_stats_util_caller2 <- function() {
  stats_util_in_same_file(stats::dgeom)
}

### UNCOMMENTING THIS AND RUNNING load_all() RESULTS IN ERROR: "...could not find function..."
### Can you use this to make an ugly solution? No, the can has just been kicked
### down the road
# uses_uses_stats_util_caller2 <- uses_stats_util_caller2()

stats_util_in_same_file <- function(distribution_function) {
  return(distribution_function)
}

In terminal, run load_all().

This does not do anything useful, but does not result in an error.

Now, in the terminal, run the assignment "uses_uses_stats_util_caller2 <- uses_stats_util_caller2()". This works exactly as I desire.

Now, uncomment the codeblock with the same assignment, which just worked, and re-run load_all() in terminal. This results in an error.

If you put stats_util_in_same_file() into a separate source file and export it (the more realistic scenario), the same results hold.

2

There are 2 best solutions below

2
Ricardo Semião On

This might be a problem with the order that R runs the code within a package. Section 6.4 and 7.4 of "R Packages (2e)" by Hadley Wickham and Jennifer Bryan explain this with more detail.

7.4 talks about using a environment to define utilities. You don't need to use an environment, but the discussion on where would one define such utilities is still relevant:

The definition of the environment should happen as a top-level assignment in a file below R/. [...] As for where to place this definition, there are two considerations: Define it before you use it. If other top-level calls refer to the environment, the definition must come first when the package code is being executed at build time. This is why R/aaa.R is a common and safe choice. [...]

In your package, shift_support is defined in "support_utils.R", which is run (alphabetically) after where it's used, "geom_shifted.R", so it isn't defined before its use.

This an hypothesis, and I assume I didn't tested if this is the case. Try changing the name of "support_utils.R" to something like "aaa.R" or "aaa_support_utils.R".

0
user23471877 On

A kludgey solution, which might issues (performance or otherwise): Within factory_function_utils.R

#' @rdname factory_function_utils
#' @export
shift_support <- function(args, distribution_function, shift_args, env = parent.frame()) {
  body = substitute({
    arguments = c(as.list(environment()))
    temp_str = paste(shift_args, collapse = ";")
    transformed_args = within(data = arguments,
                              expr = {
                                eval(parse(text = temp_str))
                              })
    do.call(what = distribution_function,
            args = transformed_args)
  })
  args <- as.pairlist(args)
  eval(call("function", args, body), env)
}

Within geom_shifted.R

#' @rdname geom_shifted
dgeom_shifted_programmatic <- function(...) {
  shift_support(args = formals(dgeom),
                distribution_function = dgeom,
                shift_args = list("x = x - 2"))
}

#' @rdname geom_shifted
#' @export
dgeom_shifted_programmatic_wrapper <- function(x,prob,log = FALSE) {
  dgeom_shifted_programmatic()(x = x, prob = prob, log = log)
}

dgeom_shifted_programmatic_wrapper() is loaded when I load the package, is not in the global environment, and works as I expect it to.

This is based on a hint from section 6.4.3 from R Packages (2e). Founds thanks to @ricardo-semião-e-castro. I am still keeping this issue open because this might not be a proper solution.

I have created a different public repo here, which also shows a bit of an extension. Multiple functions and an arbitrary number of transformations to arguments can be fed to a factory function. This is shown in gse.R. This is very similar to my actual use case. I am still unsure if this is a good solution, but it works.