I have two data frames to compare. Screenshots of the data frames are shown below
There are three things I am trying to check:
- 1st Check: Items that existed in Data 1, but do not exist in Data 2 [Item4; SubItem4; SubsubItem1]
- 2nd Check: Items that did not exist in Data 1, but do exist in Data 2 [Item6; SubItem1; SubsubItem1]
- 3rd Check: Item that exist in both list, but has changed in value [Item2; SubItem5; SubsubItem1]
I got the first and the second check easily with anti_join()
MissingfromData2 <- anti_join(Data1,Data2, by = c("Property.1","Property.2","Property.3"))
MissingfromData1 <- anti_join(Data2,Data1, by = c("Property.1","Property.2","Property.3"))
For the 3rd check, however, I cannot seem to lock in the identifiers in the by=c("Property.1","Property.2","Property3")
When I do the following
changedValue1 <- setdiff(Data1,Data2, by = c("Property.1","Property.2","Property.3"))
changedValue2 <- setdiff(Data2,Data1, by = c("Property.1","Property.2","Property.3"))
I get the additional row (from check 1 and check 2), which I do not need.
How do I obtain the result for only changed values?
I found the solution to the problem. All I needed to add to the code above was the following bit of code
which rendered the only row in changedValue2 which was missing from MissingfromData1 with the desired difference in the value columns.