Recode values using case_match() with a char array

90 Views Asked by At

In the dplyr package, recode() has been superseded in favor of case_match(). Is there a way to use labels stored in, for example, char array to recode values using case_match()?

For example, with recode() I can store labels in a char array (or read them from a CSV file) and use them for recoding:

lbls <- c(
    'male' = 'Man',
    'female' = 'Woman'
)

starwars %>%
    select( sex ) %>%
    mutate(
        sex = recode( sex, !!!lbls )
    )

# A tibble: 87 × 1
#   sex  
#   <chr>
# 1 Man  
# 2 none 
# 3 none 
# 4 Man  
# 5 Woman
# ...

However, since case_match() requires two-sided formulas (old_values ~ new_value), that does not work. Is there a way to use stored values also in case_match()?

2

There are 2 best solutions below

3
SamR On BEST ANSWER

You can create a set of rules to be evaluated.

tidyverse approach

As you're using dplyr let's go all in:

(rules <- glue::glue('"{lbl}" ~ "{val}"', lbl = names(lbls), val = lbls))
# "male" ~ "Man"
# "female" ~ "Woman"

You can then turn this character vector into a list of call objects with rlang::parse_exprs(). Then inject the list into the function call as arguments using the splice operator, !!!:

starwars |>
    select(sex) |>
    mutate(
        sex = case_match(
            sex,
            !!!rlang::parse_exprs(rules),
            .default = sex
        )
    )
# # A tibble: 87 × 1
#    sex  
#    <chr>
#  1 Man  
#  2 none 
#  3 none 
#  4 Man  
#  5 Woman
#  6 Man  
#  7 Woman
#  8 none 
#  9 Man  
# 10 Man  
# # ℹ 77 more rows
# # ℹ Use `print(n = ...)` to see more rows

base R approach

We can also do the parsing and splicing in base R. For me it's a little clearer what's going on. We can define rules with sprintf() instead of glue, as suggested by Darren Tsai.

rules <- c(
    "sex",
    sprintf('"%s" ~ "%s"', names(lbls), lbls)
)

To get the character vector into a list of language objects, instead of parse_exprs() we can use str2lang(). Then !!! is a way of applying case_match() to a list of arguments, i.e. the equivalent of do.call().

starwars |>
    select(sex) |>
    mutate(
        sex = do.call(
            case_match,
            c(
                lapply(rules, str2lang),
                list(.default = sex)
            )
        )
    )
# # A tibble: 87 × 1
#    sex
#    <chr>
#  1 Man
#  2 none
#  3 none
#  4 Man
#  5 Woman
#  <etc>

A note on .default

Note that unlike recode, we need to provide case_match() with the .default parameter:

The value used when values in .x aren't matched by any of the LHS inputs. If NULL, the default, a missing value will be used.

If this is not provided, any value not specified (e.g. "none") becomes NA

0
Darren Tsai On

You can transform the named vector into a list of formulas in advance.

rules <- Map(reformulate, shQuote(lbls), shQuote(names(lbls)))

# $`'Man'`
# "male" ~ "Man"
# 
# $`'Woman'`
# "female" ~ "Woman"

starwars %>%
  select( sex ) %>%
  mutate(
    sex = case_match(sex, !!!rules, .default = sex)
  )

# # A tibble: 87 × 1
#    sex  
#    <chr>
#  1 Man  
#  2 none 
#  3 none 
#  4 Man  
#  5 Woman
#  6 Man  
#  7 Woman
#  8 none 
#  9 Man  
# 10 Man  
# ℹ 77 more rows