how to find what is in data frame and not another in R

74 Views Asked by At

I have two data frames: codes and supply. Codes (shown below) is comprised of four fields: state,codetype,code,codetitle supply has 12 columns but three of them are state,codetype, and code

An example of this is below

    state     codetype    code    codetitle
      32          15     123456     Something
      32          15     123455     Something Else
      32          10     123455     Something Else

From there, I use the following code to concatenate the item

    supply1<- supply%>%mutate(supply1= paste0(state,codetype,code))
    codes1<- codes%>%mutate(codes1= paste0(state,codetype,code))

My question is how do I find out what combinations of state,codetype,code are in supply1 but not codes1. I would use excel and the match function to do this but there are 1.9 million rows and that exceeds the capacity of Excel.

Have looked at documentation about antijoin. However, being that there is no common field such as ID, getting a bit confused.

2

There are 2 best solutions below

2
On BEST ANSWER

tidyverse

library(dplyr)
anti_join(supply, codes, by = c("state", "codetype", "code"))
#   state codetype   code   codetitle
# 1    34       15 123459 Something_4

base R

codes$code_rn <- seq_len(nrow(codes))
supply$supply_rn <- seq_len(nrow(supply))
temp <- merge(codes, supply, by = c("state", "codetype", "code"))
temp
#   state codetype   code    codetitle.x code_rn codetitle.y supply_rn
# 1    32       15 123455 Something_Else       2 Something_3         2
# 2    32       15 123456      Something       1 Something_2         1
supply[ !supply$supply_rn %in% temp$supply_rn, ]
#   state codetype   code   codetitle supply_rn
# 3    34       15 123459 Something_4         3

(and some column clean-up)

alternative base R

This is effectively what you were starting with:

supply_id <- with(supply, paste(state, codetype, code, sep = "|"))
supply_id
# [1] "32 15 123456" "32 15 123455" "34 15 123459"
codes_id <- with(codes, paste(state, codetype, code, sep = "|"))
codes_in
# [1] "32|15|123456" "32|15|123455" "32|10|123455"
supply[!supply_id %in% codes_id,]
#   state codetype   code   codetitle supply_rn
# 3    34       15 123459 Something_4         3

data

codes <- read.table(header = TRUE, text="
    state     codetype    code    codetitle
      32          15     123456     Something
      32          15     123455     Something_Else
      32          10     123455     Something_Else")
supply <- read.table(header = TRUE, text="
    state     codetype    code    codetitle
      32          15     123456     Something_2
      32          15     123455     Something_3
      34          15     123459     Something_4")
2
On

With data.table, we do a join on the columns after converting the 'supply' to a data.table (setDT). By negating (!), we check for the elements that are not matching in 'codes' dataset

library(data.table)
setDT(supply)[!codes, on = c("state", "codetype", "code")]