Create a new dataframe according to the contrast between two similar df

224 Views Asked by At

I have a dataframe made like this:

  X Y  Z T
  1 2  4 2
  3 2  1 4
  7 5 NA 3

After several steps (not important which one) i obtained this df:

  X Y Z T
  1 2 4 2
  3 2 NA 4
  7 5 NA 3

i want to obtain a new dataframe made by only the rows which didn't change during the steps; the result would be this one:

 X  Y  Z  T
 1  2  4  2
 7  5  NA 3

How could I do?

4

There are 4 best solutions below

0
akrun On BEST ANSWER

One option with base R would be to paste the rows of each dataset together and compare (==) to create a logical vector which we use for subsetting the new dataset

dfO[do.call(paste, dfO) == do.call(paste, df),]
#   X Y  Z T
#1 1 2  4 2
#3 7 5 NA 3

where 'dfO' is the old dataset and 'df' is the new

0
talat On

You can use dplyr's intersect function:

library(dplyr)
intersect(d1, d2)
#  X Y  Z T
#1 1 2  4 2
#2 7 5 NA 3

This is a data.frame-equivalent of base R's intersect function.

In case you're working with data.tables, that package also provides such a function:

library(data.table)
setDT(d1)
setDT(d2)
fintersect(d1, d2)
#   X Y  Z T
#1: 1 2  4 2
#2: 7 5 NA 3
0
www On

Another dplyr solution: semi_join.

dt1 %>% semi_join(dt2, by = colnames(.))
  X Y  Z T
1 1 2  4 2
2 7 5 NA 3

Data

dt1 <- read.table(text = "X Y  Z T
  1 2  4 2
  3 2  1 4
  7 5 NA 3",
                  header = TRUE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "  X Y Z T
  1 2 4 2
                  3 2 NA 4
                  7 5 NA 3",
                  header = TRUE, stringsAsFactors = FALSE)
1
ira On

I am afraid that neither semi join, nor intersect or merge are the correct answers. merge and intersect will not handle duplicate rows properly. semi join will change order of the rows.

From this perspective, I think the only correct one so far is akrun's.

You could also do something like:

df1[rowSums(((df1 == df2) | (is.na(df1) & is.na(df2))), na.rm = T) == ncol(df1),]

But I think akrun's way is more elegant and likely to perform better in terms of speed.