How to convert an arbitrary SQL statement to column level lineage information via an open source solution?

94 Views Asked by At

I have SQL statements (various dialects). I want to get column level lineage information for each statement.

Example: The statement

SELECT A.c1 as c, 
       SUM(B.c2) as c2_sum
FROM A
JOIN B
  ON A.c1 = B.c1

leads to something like

{
  "c": ["A.c1", "B.c2"],
  "c2_sum": ["B.c2"],
}

The SQL statements are mostly quite close to the standard. It is Spark SQL, PostgreSQL, Presto, Athena SQL.

0

There are 0 best solutions below