I was wondering whether someone knows if the dplyr
extension packages (dbplyr
and dtplyr
) allow non-equi joins within the usual dplyr workflow? I rarely need data.table
, but fast non-equi joins are the only moments where I always need to setDT
, then perform the join and then switch back with as_tibble()
. I scrolled through the issues in the packages on github but didn't find whether this is something that's planned or already implemented.
Non-equi join in tidyverse
2k Views Asked by b_surial At
3
There are 3 best solutions below
2

For dbplyr: While SQL supports non-equi joins, I have not found a dplyr approach that is equivalent. My usual work around is very similar to the r-bloggers link posted by @Waldi to join on the equality conditions and then filter on the inequality conditions.
For example:
output = join(df1, df2, by = c("df1_id" = "df2_id")) %>%
filter(df1_date <= df2_date)
This translates to SQL similar to:
SELECT *
FROM df1
JOIN df2
ON df1_id = df2_id
WHERE df1_date <= df2_date
Which is not that different from:
SELECT *
FROM df1
JOIN df2
ON df1_id = df2_id
AND df1_date <= df2_date
0

There has been a new option for this in dbplyr
since the version 1.4.0: sql_on
. Citing Kirill Müller:
There is #2240 for dplyr, but it's going to take a while. For databases we already have a workaround [ie. generic SQL joins].
library(dplyr)
library(dbplyr)
tbl1 <- memdb_frame(a = 1:3, b = 4:2)
tbl2 <- memdb_frame(c = 1:3, b = 2:0)
left_join(tbl1, tbl2, sql_on = "LHS.b < RHS.c")
Non-equi joins are available since
1.1.0
via the functionjoin_by
. To create non-equi joins, you can use<
,>
,>=
,<=
, or helpersbetween
,within
,overlaps
andnearest
.