Mutate df based on a pair of column values' presence in another df in R

465 Views Asked by At

edited - fixed the typos that Frank and ycw pointed out

I am having trouble identifying pairs of values in a dataframe that exist in another data frame. Because my data is in long format and each column contains the same value many times, I'm getting overly broad results when I try to identify matches. I've tried a number of ways and can't get the exact results I need. Given how simple the operation is, I expect I'm missing something very obvious.

For my example, I have a set of person records in long format, df ppl. I also have another set of person observations for specific time periods, in this example, it's absences on certain days dfabs. I want to mutate ppl to create a new column, that will show when a person-day row matches a person-day observation in df dfabs. I keep getting a problem when I try to mutate however, as it's not clear to me how to subset based on two columns at once.

Example below:

name <- c("Bob", "Bob", "Ana","Ana", "Jorge","Jorge")
day <- c(1,2,1,2,1,2)
eval <- c(4,3,5,5,2,5)
ppl <- as.data.frame(cbind(name,day, eval), names=TRUE)

   name day eval
1   Bob   1    4
2   Bob   2    3
3   Ana   1    5
4   Ana   2    5
5 Jorge   1    2
6 Jorge   2    5

nameabs <- c("Bob","Jorge","Cindy")
dayabs <- c(1,2,1)
dfabs <- as.data.frame(cbind(nameabs, dayabs), names=TRUE)
dfabs

  nameabs dayabs
1     Bob      1
2   Jorge      2
3   Cindy      1

The result I want to see is:

    name day eval absent
1   Bob   1    4      1
2   Bob   2    3      0
3   Ana   1    5      0
4   Ana   2    5      0
5 Jorge   1    2      0
6 Jorge   2    5      1

To do this, I've tried using the mutate function:

mutate(ppl, absent = ifelse(((name %in% dfabs$nameabs) & (day %in% dfabs$dayabs)),1,0))

Result:

   name day eval absent
1   Bob   1    4      1
2   Bob   2    3      1
3   Ana   1    5      0
4   Ana   2    5      0
5 Jorge   1    2      1
6 Jorge   2    5      1

This is not right. It lists Bob and Jorge as absent on both days. It seems that in my ifelse() statement, it evaluated the two logical expressions (name %in% dfabs$nameabs) and (day %in% dfabs$dayabs) on separate rows.

Using other functions such as filter() gives me the same problem. I can't figure out how to make it evaluate the entire expression on one row. In general, with and without dplyr, I've been having a lot of trouble with this general problem: how to subset one df based on a pair's (across two columns) presence in another df? I'm working with a much larger and more complex data set, and having a generalizeable answer would be very helpful.

Thanks in advance for the help!

0

There are 0 best solutions below