I am new to python and I am using pandasql to perform condition-based joins:

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

q = """
    SELECT *
        FROM
            a_Df A
        INNER JOIN
            b_Df B
        ON 
            a.effective_date between b.effective_date-5 and b.effective_date
            and a.id =b.id
    """
pj = pysqldf(q)

This query is basically for scaling date from effective_date-5 and current effective_date, but this is not functioning in the expected order i.e. Let's say current effective_date is '2021-03-21' then effective_date-5 is giving 2016 instead of '2021-03-16', hence I am not getting the desire output.

So the query becomes:

q = """
        SELECT *
            FROM
                a_Df A
            INNER JOIN
                b_Df B
            ON 
                a.effective_date between '2016' and '2021-03-21'
                and a.id =b.id
"""

instead of

q = """
        SELECT *
            FROM
                a_Df A
            INNER JOIN
                b_Df B
            ON 
                a.effective_date between '2021-03-16' and '2021-03-21'
                and a.id =b.id
    """

and as the dateset is huge I cannot perform hardcoded changes.

Does anyone have any idea on how to solve this?

1

There are 1 best solutions below

0
On

as this pandasql/sqldf work on sqlite so anyone can try any function that sqlite support, hence the syntax would be like this..

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

q = """
    SELECT *
        FROM
            a_Df A
        INNER JOIN
            b_Df B
        ON 
            a.effective_date between date(b.effective_date, '-5 day')and b.effective_date
            and a.id =b.id
    """
pj = pysqldf(q)