This is a problem that took me a long time to solve, and I wanted to share my solution. Here's the problem.
We have 2 pandas DataFrames that need to be outer joined on a very complex condition. Here was mine:
condition_statement = """
ON (
A.var0 = B.var0
OR (
A.var1 = B.var1
AND (
A.var2 = B.var2
OR A.var3 = B.var3
OR A.var4 = B.var4
OR A.var5 = B.var5
OR A.var6 = B.var6
OR A.var7 = B.var7
OR (
A.var8 = B.var8
AND A.var9 = B.var9
)
)
)
)
"""
Doing this in pandas would be a nightmare.
I like to do most of my DataFrame massaging with the pandasql package. It lets you run SQL queries on top of the DataFrames in your local environment.
The problem with pandasql is it runs on a SQLite engine, so you can't do RIGHT or FULL OUTER joins.
So how do you approach this problem?
Well you can achieve a FULL OUTER join with two LEFT joins, a condition, and a UNION.
First, declare a snippet with the columns you want to retrieve:
Next, build a condition that represents all values in A being NULL. I built mine using the columns in my DataFrame:
Now, do the 2-LEFT-JOIN-with-a-UNION trick using all of these snippets: