Use multiple columns as identifiers while comparing two data frames in R using setdiff

475 Views Asked by Mr.CR At 26 June 2025 at 23:30

I have two data frames to compare. Screenshots of the data frames are shown below

There are three things I am trying to check:

1st Check: Items that existed in Data 1, but do not exist in Data 2 [Item4; SubItem4; SubsubItem1]
2nd Check: Items that did not exist in Data 1, but do exist in Data 2 [Item6; SubItem1; SubsubItem1]
3rd Check: Item that exist in both list, but has changed in value [Item2; SubItem5; SubsubItem1]

I got the first and the second check easily with anti_join()

MissingfromData2 <- anti_join(Data1,Data2, by = c("Property.1","Property.2","Property.3"))
MissingfromData1 <- anti_join(Data2,Data1, by = c("Property.1","Property.2","Property.3"))

For the 3rd check, however, I cannot seem to lock in the identifiers in the by=c("Property.1","Property.2","Property3")

When I do the following

changedValue1 <- setdiff(Data1,Data2, by = c("Property.1","Property.2","Property.3"))

changedValue2 <- setdiff(Data2,Data1, by = c("Property.1","Property.2","Property.3"))

I get the additional row (from check 1 and check 2), which I do not need.

How do I obtain the result for only changed values?

Original Q&A

There are 1 best solutions below

Mr.CR On 16 September 2020 at 10:09

I found the solution to the problem. All I needed to add to the code above was the following bit of code

Result <- setdiff(changedValue2,MissingfromData1, by = c("Property.1","Property.2","Property.3"))

which rendered the only row in changedValue2 which was missing from MissingfromData1 with the desired difference in the value columns.

Use multiple columns as identifiers while comparing two data frames in R using setdiff

There are 1 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in SET-DIFFERENCE

Related Questions in ANTI-JOIN

Trending Questions

Popular # Hahtags

Popular Questions