I have two data frames DF1 and DF2 like this.
ID = c(1, 2, 3, 4)
Issues = c('Issue1, Issue4', 'Issue2, Issue5, Issue6', 'Issue3, Issue4', 'Issue1, Issue5')
Location = c('x', 'y', 'z', 'w')
Customer = c('a', 'b', 'c', 'd')
DF1 = data.frame(ID, Issues, Location, Customer)
Root_Cause = c('R1', 'R2', 'R3', 'R4')
List_of_Issues = c('Issue1, Issue3, Issue5', 'Issue2, Issue1, Issue4', 'Issue6, Issue7', 'Issue5, Issue6')
DF2 = data.frame(Root_Cause, List_of_Issues)
I want to compare both the data frames with "Issues" of DF1 and "List_of_Issues" of DF2 and if more than two words in "Issues" column is there in "List_of_Issues" column in DF2, then I want to populate subsequent "Root_Cause" from DF2. My resulting data frame should look like DF3.
ID = c(1, 2, 3, 4)
Issues = c('Issue1, Issue4', 'Issue2, Issue5, Issue6', 'Issue3, Issue4', 'Issue1, Issue5')
Location = c('x', 'y', 'z', 'w')
Customer = c('a', 'b', 'c', 'd')
Root_Cause = c('R2', 'R4', NA, 'R1')
DF3 = data.frame(ID, Issues, Location, Customer, Root_Cause)
Using data.table:
EDIT: I have edited your sample data to account for multi-root cause eventualities. In this data,
ID==1
corresponds to both R2 and R3.Data
Code
Result