data: Merging Pandas DataFrame within a specific Date Range
I want to use pyjanitor referring this post but it doesn't give if merge with another columns at the same time.
Try
df_1.conditional_join(
... df_2,
... ("InvoiceDate ", "PromotionStart ", ">="),
... ("InvoiceDate ", "PromotionEnd ", "<=")
... )
output
+------------+-------------+---------------+-----------+
| LocationNo | InvoiceDate | InvoiceAmount | Promotion |
+------------+-------------+---------------+-----------+
| A | 01-Jul-20 | 79 | Yes |
+------------+-------------+---------------+-----------+
| B | 01-Jul-20 | 72 | |
+------------+-------------+---------------+-----------+
| C | 01-Jul-20 | 24 | |
+------------+-------------+---------------+-----------+
| A | 02-Jul-20 | 68 | Yes |
+------------+-------------+---------------+-----------+
| B | 02-Jul-20 | 6 | Yes |
+------------+-------------+---------------+-----------+
| C | 02-Jul-20 | 27 | |
+------------+-------------+---------------+-----------+
| A | 03-Jul-20 | 25 | |
+------------+-------------+---------------+-----------+
| B | 03-Jul-20 | 62 | Yes |
+------------+-------------+---------------+-----------+
| C | 03-Jul-20 | 58 | Yes |
+------------+-------------+---------------+-----------+
| D | 03-Jul-20 | 36 | Yes |
+------------+-------------+---------------+-----------+
| E | 03-Jul-20 | 65 | |
+------------+-------------+---------------+-----------+
| F | 03-Jul-20 | 81 | |
+------------+-------------+---------------+-----------+
With conditional_join you need to include another tuple for the equi join:
As @HenryEcker rightly pointed out, your question can be solved by merging, then filtering. For your use case conditional_join might be overkill; for equi joins, what conditional_join does is to intercept the indices that are generated from pandas internal merge function (a hash implementation), and then run the non-equi joins, before creating the final dataframe. No sorting is done, unlike the strictly non equi joins, as hash merge are usually faster for equi joins (caveat - R's data.table uses some form of binary search for its joins and is quite fast, usually even faster than Pandas). For large dataframes, it might come in handy with some performance gains; your mileage might vary.
Let's look at a silly example below with a million rows - the code below is based on the dev version: