How to create new variables based on an external named list/vector of computations dplyr

66 Views Asked by At

Imagine I want to do the following operation:

library(dplyr)
mtcars %>%
    group_by(x = am) %>%
    summarize(y = sum(vs),
              q = sum(mpg),
              p = sum(mpg/vs))

which yields:

#> # A tibble: 2 × 4
#>       x     y     q     p
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     0     7  326.   Inf
#> 2     1     7  317.   Inf

However, I would like to do the groupings and the summary based on these two external vectors:

x_groups <- c("x" = "am")
y_now <- c("y" = "vs", "q" = "mpg", "p" = "mpg/vs")

How can I have the same result but through a programmatic, non-standard evaluation approach?

3

There are 3 best solutions below

0
MrFlick On BEST ANSWER

You can parse your strings into expressions. The group by is easy, for the summarize, we need to transform to add the sum. But you can do

grpexpr <- rlang::parse_exprs(x_groups)
sexpr <- rlang::parse_exprs(y_now) |> lapply(function(x) bquote(sum(.(x))))

and since those are named lists, you can inject them into the expression with !!!

mtcars %>%
  group_by(!!!grpexpr) %>%
  summarize(!!!sexpr)
1
Alberto Agudo Dominguez On

There are two ways you can solve this based on your ability to modify the inputs. If you are allowed to create a different input such as a list, I would opt for approach 1.

Approach 1:


Modify y_now by instead creating a list showing the computations you will need and defuse the expressions by wrapping them with rlang::expr(). Then modify the code in group_by and summarise to allow for external inputs. := notation in group_by for naming, and !!! for evaluation of defused expressions. This is how it would look like:

x_groups <- c("x" = "am")
y_now <- list(y = rlang::expr(sum(vs)), q = rlang::expr(sum(mpg)), p = rlang::expr(sum(mpg/vs)))
mtcars %>% 
  group_by(!!sym(names(x_groups)) := !!as.name(x_groups)) %>% 
  summarise(!!!y_now)
#> # A tibble: 2 × 4
#>       x     y     q     p
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     0     7  326.   Inf
#> 2     1     7  317.   Inf

Approach 2:


In this case, you cannot create a different input but work with what you've been given. So you should transform it into the same object as the list y_now of approach 1, in order to do that you should transform the vector into a list and then turn the expressions into a call. Then apply the same non-standard evaluation expressions as in Approach 1.

x_groups <- c("x" = "am")
y_now  <- c("y" = "vs", "q" = "mpg", "p" = "mpg / vs")
y_now <- as.list(y_now) %>% 
  purrr::map(\(variable) str2lang(paste0("sum(", variable, ")")))
mtcars %>% 
  group_by(!!sym(names(x_groups)) := !!as.name(x_groups)) %>% 
  summarise(!!!y_now)
#> # A tibble: 2 × 4
#>       x     y     q     p
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     0     7  326.   Inf
#> 2     1     7  317.   Inf
2
TarJae On

To methodically follow the process (though it may not adhere strictly to the DRY principle):

Generally we uss !! bang-bang operator to unquote, which is the basis of non-standard evaluation (NSE) within tidyverse functions.

With !!names(y_now)[x] := sum(!!sym(y_now[[x]]))we create a column with the name of the x element in the list of y_now (here vs).

The issue arises when it comes to element 3 of y_now: there is no column mpg/vs, therefore we use here: sum(eval(parse(text = y_now[[3]])))

library(dplyr)
library(rlang)

x_groups <- c("x" = "am")
y_now <- c("y" = "vs", "q" = "mpg", "p" = "mpg/vs")


mtcars %>% 
  group_by(!!sym(x_groups[[1]])) %>% 
  summarize(
    !!names(y_now)[1] := sum(!!sym(y_now[[1]])),
    !!names(y_now)[2] := sum(!!sym(y_now[[2]])),
    !!names(y_now)[3] := sum(eval(parse(text = y_now[[3]])))
  )

# A tibble: 2 x 4
     am     y     q     p
  <dbl> <dbl> <dbl> <dbl>
1     0     7  326.   Inf
2     1     7  317.   Inf