how to anti_join only if entire row is similar r

289 Views Asked by At

I have two data. frames, one which contains all transactions exported from an internal system (fullDF), and one that contains the transactions that have already been reviewed (reviewDF). Both DFs contain the same format, column headers, and were derived from the same internal system. Columns consist of orig_name, orig_id, orig_fi, orig_address, bene_name, bene_id, bene_fi, bene_address, amount, and date.

I am trying to anti_join duplicate transactions found in both DFs to remove the transactions that have already been reviewed; however, the anti_join cannot be based upon a single column criterion. My goal is to anti_join all duplicates only if all column values in a row are similar.

I tried the following:

anti_join(fullDF, reviewDF, by = "orig_name", "orig_id", "orig_fi", "orig_address", "bene_name", "bene_id", "bene_fi", "bene_address", "amount", "date").

Even though I specified all column names, it seems r is still removing transactions based upon one of the criteria; for instance: if bene_name is a repeat, it will remove all rows with a similar bene_name, even though not all of the transactions with said bene_name have been looked at. More specifically, rows that contain similar bene_name are being removed, even though other fields, such as date or amount, are different.

Is there a better way to do this?

Thank you all for your help!

1

There are 1 best solutions below

0
On

We need to concatenate the by character vectors with c

 anti_join(fullDF, reviewDF, by = c("orig_name", "orig_id", "orig_fi", "orig_address", "bene_name", "bene_id", "bene_fi", "bene_address", "amount", "date"))