Often I need to spread multiple value columns, as in this question. But I do it often enough that I'd like to be able to write a function that does this.
For example, given the data:
set.seed(42)
dat <- data_frame(id = rep(1:2,each = 2),
grp = rep(letters[1:2],times = 2),
avg = rnorm(4),
sd = runif(4))
> dat
# A tibble: 4 x 4
id grp avg sd
<int> <chr> <dbl> <dbl>
1 1 a 1.3709584 0.6569923
2 1 b -0.5646982 0.7050648
3 2 a 0.3631284 0.4577418
4 2 b 0.6328626 0.7191123
I'd like to create a function that returns something like:
# A tibble: 2 x 5
id a_avg b_avg a_sd b_sd
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
How can I do that?
We'll return to the answer provided in the question linked to, but for the moment let's start with a more naive approach.
One idea would be to
spreadeach value column individually, and then join the results, i.e.(I used a
full_joinjust in case we run into situations where not all combinations of the join columns appear in all of them.)Let's start with a function that works like
spreadbut allows you to pass thekeyandvaluecolumns as characters:The key ideas here are to unquote the arguments
key_colandvalue_cols[i]using the!!operator, and using thesepargument inspreadto control the resulting value column names.If we wanted to convert this function to accept unquoted arguments for the key and value columns, we could modify it like so:
The change here is that we capture the unquoted arguments with
rlang::quosandrlang::enquoand then simply convert them back to characters usingtidyselect::vars_select.Returning to the solution in the linked question that uses a sequence of
gather,uniteandspread, we can use what we've learned to make a function like this:This relies on the same techniques from rlang from the last example. We're using some unusual names like
..var..for our intermediate variables in order to reduce the chances of name collisions with existing columns in our data frame.Also, we're using the
separgument inuniteto control the resulting column names, so in this case when wespreadwe forcesep = NULL.