Create a new dataframe according to the contrast between two similar df

214 Views Asked by At

I have a dataframe made like this:

  X Y  Z T
  1 2  4 2
  3 2  1 4
  7 5 NA 3

After several steps (not important which one) i obtained this df:

  X Y Z T
  1 2 4 2
  3 2 NA 4
  7 5 NA 3

i want to obtain a new dataframe made by only the rows which didn't change during the steps; the result would be this one:

 X  Y  Z  T
 1  2  4  2
 7  5  NA 3

How could I do?

4

There are 4 best solutions below

0
On BEST ANSWER

One option with base R would be to paste the rows of each dataset together and compare (==) to create a logical vector which we use for subsetting the new dataset

dfO[do.call(paste, dfO) == do.call(paste, df),]
#   X Y  Z T
#1 1 2  4 2
#3 7 5 NA 3

where 'dfO' is the old dataset and 'df' is the new

1
On

I am afraid that neither semi join, nor intersect or merge are the correct answers. merge and intersect will not handle duplicate rows properly. semi join will change order of the rows.

From this perspective, I think the only correct one so far is akrun's.

You could also do something like:

df1[rowSums(((df1 == df2) | (is.na(df1) & is.na(df2))), na.rm = T) == ncol(df1),]

But I think akrun's way is more elegant and likely to perform better in terms of speed.

0
On

You can use dplyr's intersect function:

library(dplyr)
intersect(d1, d2)
#  X Y  Z T
#1 1 2  4 2
#2 7 5 NA 3

This is a data.frame-equivalent of base R's intersect function.

In case you're working with data.tables, that package also provides such a function:

library(data.table)
setDT(d1)
setDT(d2)
fintersect(d1, d2)
#   X Y  Z T
#1: 1 2  4 2
#2: 7 5 NA 3
0
On

Another dplyr solution: semi_join.

dt1 %>% semi_join(dt2, by = colnames(.))
  X Y  Z T
1 1 2  4 2
2 7 5 NA 3

Data

dt1 <- read.table(text = "X Y  Z T
  1 2  4 2
  3 2  1 4
  7 5 NA 3",
                  header = TRUE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "  X Y Z T
  1 2 4 2
                  3 2 NA 4
                  7 5 NA 3",
                  header = TRUE, stringsAsFactors = FALSE)