Dynamically provide argument to function inside mutate

861 Views Asked by At

First off - my apologies if this has been asked before, I've looked and haven't been able to find anything that matches what I'm trying to do.

I'm trying to create a function that bins data according to a user-generated column in a data frame. To do this, I'm using the mutate() function from dplyr and cut() from base R. However, I can't figure out how to use a column name that's passed through a function inside the cut() function (which appears inside mutate).

I've spent several hours looking through this and this but still haven't figured it out. My understanding is that foo(), bar() and the final line in the code below should all produce the same output. However, I get two errors for the functions, and the one that isn't wrapped in a function and just uses a hard coded column name works fine.

What's going on here? Why does foo() produce a different output than bar()? And how do I correctly use lazyeval to allow the correct behavior in a function?

library(dplyr)
library(lazyeval)

foo <- function(data, col, bins){
    by = lazyeval::interp(quote(x), x = as.name(col))
    print(paste0("typeof(by): ", typeof(by)))
    print(paste0(" by: ", by))

    df <- data %>%
      dplyr::mutate(bins = cut(by,
        breaks = bins,
        dig.lab = 5,
        include.lowest = T))
    df
}

bar <- function(data, col, bins){
  df <- data %>%
    dplyr::mutate(bins = cut(lazyeval::interp(quote(x), x = as.name(col)),
      breaks = bins,
      dig.lab = 5,
      include.lowest = T))
  df
}

#produce sample data and bins list
df <- expand.grid(temp=0:8,precip=seq(0.7,1.3,by=0.1))
df$rel <- seq(40,100,length=63)
bins <- seq(40,100,by=10)

foo(df,"rel",bins) # produces "Error: 'rel' not found"
bar(df,"rel",bins) # produces "Error: 'x' must be numeric"

# but this works
dplyr::mutate(df, bins = cut(rel, breaks = bins, dig.lab = 5, include.lowest = T))
1

There are 1 best solutions below

0
On BEST ANSWER

As @aosmith mentioned in their comment, the solution is to use mutate_(bins = interp(~cut(x, bins, dig.lab = 5, include.lowest = TRUE), x = as.name(col))). Using mutate_ instead of mutate allows us to use standard evaluation.

It's easiest to see what's going on with interp and cut if we call interp outside of mutate_. (It executes the same either way.) Assuming col == "rel",

call = interp(~cut(x, bins, dig.lab = 5, include.lowest = TRUE), x = as.name(col))) 

will give

~cut(rel, bins, dig.lab = 5, include.lowest = TRUE)

Inserting this expression into mutate allows us to exactly follow the examples provided here.

muatate_(bins = call)

Gives the correct result.

You could also allow the user to provide a column name that replaces "bins":

dplyr::mutate_(.dots = setNames(call, c(binName)))