I have two dataframes, both have a Last_Name column. First dataframe has a column Contains_First_Name and the second has a column called First_Name. I want to join the two on the exact spelling of Last_Name and a substring match of Contains_First_Name and First_Name (where First_Name is a substring of Contains_First_Name.) Please see an example below.

library(dplyr)
library(stringr)

# Create df1
Last_Name <- c("Smith", "Jones", "Adams", "Rogers", "Lee", "Lee", "Lee")
Contains_First_Name <- c("Kimberly Nicole", "Patrick L", "Johnson Ann", "Rick", "McAdams Jennifer Marie", "Kirk", "Kirk B")
Account_Number <- c("123", "345", "678", "901", "234", "567", "890")

df1 <- data.frame(Last_Name, Contains_First_Name, Account_Number)

# Create df2
Last_Name <- c("Smith", "Jones", "Adams", "Lee", "Lee")
First_Name <- c("Kimberly", "Patrick", "Ann", "Jennifer", "Kirk")

df2 <- data.frame(Last_Name, First_Name)

Resulting dataframes:

> df1
  Last_Name    Contains_First_Name Account_Number
1     Smith        Kimberly Nicole            123
2     Jones              Patrick L            345
3     Adams            Johnson Ann            678
4    Rogers                   Rick            901
5       Lee McAdams Jennifer Marie            234
6       Lee                   Kirk            567
7       Lee                 Kirk B            890
> df2
  Last_Name First_Name
1     Smith   Kimberly
2     Jones    Patrick
3     Adams        Ann
4       Lee   Jennifer
5       Lee       Kirk

What I want to end up with:

> df3
  Last_Name    Contains_First_Name Account_Number First_Name
1     Smith        Kimberly Nicole            123 Kimberly
2     Jones              Patrick L            345 Patrick
3     Adams            Johnson Ann            678 Ann
4       Lee McAdams Jennifer Marie            234 Jennifer
5       Lee                   Kirk            567 Kirk
6       Lee                 Kirk B            890 Kirk

I tried this:

df3 <-
  filter(df1,
         Last_Name %in% df2$Last_Name,
         str_detect(Contains_First_Name, paste(df2$First_Name, collapse = "|")))

Getting the following error:

Error in match.arg(method) : 'arg' must be NULL or a character vector

I also explored fuzzyjoin library but could not figure out how to join on two variables with two different join types (exact and substring.) I saw a similar question which does not appear to have an answer: Merge two dataframes based on an exact match in one column and match within an error in another column in R. Any advice is greatly appreciated. Thank you.

1

There are 1 best solutions below

3
dufei On BEST ANSWER

I'd say you have two options: either use an equi-join on only the first column and filter later or use fuzzyjoin, as you described:

# Approach 1: Match all, filter later

inner_join(df1, df2, join_by(Last_Name), relationship = "many-to-many") |> 
  filter(str_detect(Contains_First_Name, First_Name))
#> # A tibble: 6 × 4
#>   Last_Name Contains_First_Name    Account_Number First_Name
#>   <chr>     <chr>                  <chr>          <chr>     
#> 1 Smith     Kimberly Nicole        123            Kimberly  
#> 2 Jones     Patrick L              345            Patrick   
#> 3 Adams     Johnson Ann            678            Ann       
#> 4 Lee       McAdams Jennifer Marie 234            Jennifer  
#> 5 Lee       Kirk                   567            Kirk      
#> 6 Lee       Kirk B                 890            Kirk

# Approach 2: fuzzyjoin

fuzzyjoin::fuzzy_inner_join(
  df1,
  df2,
  by = c("Last_Name" = "Last_Name", "Contains_First_Name" = "First_Name"),
  match_fun = list(`==`, \(x, y) str_detect(x, y))
) |> 
  select(!Last_Name.y) |> 
  rename(Last_Name = Last_Name.x)
#> # A tibble: 6 × 4
#>   Last_Name Contains_First_Name    Account_Number First_Name
#>   <chr>     <chr>                  <chr>          <chr>     
#> 1 Smith     Kimberly Nicole        123            Kimberly  
#> 2 Jones     Patrick L              345            Patrick   
#> 3 Adams     Johnson Ann            678            Ann       
#> 4 Lee       McAdams Jennifer Marie 234            Jennifer  
#> 5 Lee       Kirk                   567            Kirk      
#> 6 Lee       Kirk B                 890            Kirk

Created on 2024-01-06 with reprex v2.0.2