I am new to python and I am using pandasql
to perform condition-based joins:
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
q = """
SELECT *
FROM
a_Df A
INNER JOIN
b_Df B
ON
a.effective_date between b.effective_date-5 and b.effective_date
and a.id =b.id
"""
pj = pysqldf(q)
This query is basically for scaling date from effective_date-5
and current effective_date
, but this is not functioning in the expected order
i.e. Let's say current effective_date
is '2021-03-21' then effective_date-5
is giving 2016 instead of '2021-03-16', hence I am not getting the desire output.
So the query becomes:
q = """
SELECT *
FROM
a_Df A
INNER JOIN
b_Df B
ON
a.effective_date between '2016' and '2021-03-21'
and a.id =b.id
"""
instead of
q = """
SELECT *
FROM
a_Df A
INNER JOIN
b_Df B
ON
a.effective_date between '2021-03-16' and '2021-03-21'
and a.id =b.id
"""
and as the dateset is huge I cannot perform hardcoded changes.
Does anyone have any idea on how to solve this?
as this
pandasql/sqldf
work onsqlite
so anyone can try any function thatsqlite
support, hence the syntax would be like this..