I have SQL statements (various dialects). I want to get column level lineage information for each statement.
Example: The statement
SELECT A.c1 as c,
SUM(B.c2) as c2_sum
FROM A
JOIN B
ON A.c1 = B.c1
leads to something like
{
"c": ["A.c1", "B.c2"],
"c2_sum": ["B.c2"],
}
The SQL statements are mostly quite close to the standard. It is Spark SQL, PostgreSQL, Presto, Athena SQL.