I have a huge dataset and I would like to insert a dummy variable column based on a set of conditions:
I have my main df (A) in which I have 5 million rows and 10 columns, where 4 of them are date;hour;minute;second and these go from 2020 to 2023.
On the other df (B) I have the same columns but I have only 30 rows.
I want A to look at B and put a 1 to all the rows where date,hour,minute,second match the date,hour,minute,second of B, and 0 to all the rest. so, in the end, i should find my self with a column where I have 30 1 and 4.999.970 0
Even better would be to have like date,hour,minute matching exactly, and second matching "more or less" (say like +/- 5 seconds)
Can you help please?
I thought a solution could have been:
A$dummy <- for (i in A){
ifelse("A$date"=="B$date"&"A$hour"=="B$hour"&
"A$minute"=="B$minute"&or("A$second">="B$second"-5,"A$second"<="B$second"+5),1,0)
}

Here is a solution using tidyverse (including code that generates example data).
It is fully possible to use the same approach to create a dummy variable indicating whether a row in
Ahas a matching row inBwithin a +/- 5 second margin. However, based on the data in the image you provided, some rows inAwould probably have multiple "close-enough" matches inB. This could easily result in duplicates of rows fromA.Assuming you want to keep the
Ato its original number of rows and add a column toAindicating whether or not there is one or more rows inBthat match within a +/- 5 second margin, you could do this: