Create different dataframe from reference dataframes based on date

26 Views Asked by At

I have a question regarding the creation of new dataframes, based on the combination of 2 existing ones.

For the following example, for each row in df1 (which has a unique combination of var1 and var2), a new data frame should be made created from df2 (which has the same values for var1 and var2 but has multiple IDs with those variables). The newly created DFs shoud only have records where that have a date_ID < date_agg. Obviously, my real dataframes are much larger so I'm hoping for something not too computationally intensive ;-) By the way, the new dataframes should contain all records from df2 that follow the date_ID < date_agg rule, not just records with the same var1/var2 combination.

sample data:

set.seed(1)

var1 <- c("A","B","C","D")
var2 <- c("X","Y","Z")

df1 <- expand.grid(var1,var2)
df1$date_agg = sample(seq(as.Date('2000/01/01'), as.Date('2023/01/01'), by="day"), 12, replace = TRUE)

df2 <- data.frame(ID = sample(1:1000, replace=FALSE),
                  var1 = sample(c("A","B","C","D"),1000, replace = TRUE),
                  var2 = sample(c("X","Y","Z"),1000, replace = TRUE),
                  date_ID = sample(seq(as.Date('2000/01/01'), as.Date('2023/01/01'), by="day"), 1000, replace = TRUE))
1

There are 1 best solutions below

2
Wimpel On BEST ANSWER

No desired output, so hard to check. But I believe you are looking for something like this data.table approach

library(data.table)
# set to data.table format
setDT(df1); setDT(df2)
# split df1 to individual rows
L <- split(df1, f = seq.int(nrow(df1)))
# perform joins, resulting in a list of data.tables
L.joins <- lapply(L, function(x) {
  df2[x, .(ID, var1, var2, date_ID = x.date_ID, date_agg = i.date_agg), 
      on = .(var1 = Var1 , var2 = Var2, date_ID < date_agg)]
})
# the first entry of the list looks like
L.joins[[1]]
#       ID   var1   var2    date_ID   date_agg
#    <int> <char> <char>     <Date>     <Date>
# 1:   940      A      X 2001-02-16 2002-10-13
# 2:   500      A      X 2000-09-02 2002-10-13
# 3:   829      A      X 2000-03-23 2002-10-13
# 4:    25      A      X 2000-10-25 2002-10-13
# 5:   835      A      X 2000-06-10 2002-10-13
# 6:   710      A      X 2000-03-19 2002-10-13
# 7:   409      A      X 2000-05-18 2002-10-13
# 8:   551      A      X 2002-07-07 2002-10-13
# 9:    21      A      X 2001-06-10 2002-10-13
#10:   496      A      X 2002-06-10 2002-10-13
#11:   160      A      X 2001-07-19 2002-10-13
#12:   773      A      X 2000-06-21 2002-10-13