How to get table names from join condition using gsqlarser?

172 Views Asked by At

I am trying to implement sqlparser and using gsqlparser from here. The source of the jar is in Java but I am implementing the same in Scala.

Below is my query which contains a join condition.

 SELECT e.last_name AS name, e.commission_pct comm, e.salary * 12 "Annual Salary" FROM scott.employees AS e right join scott.companies as c on c.orgid = e.orgid and c.orgname = e.orgn WHERE e.salary > 1000 ORDER BY e.first_name, e.last_name

I was able to parse the query to read names & aliases of columns, where conditions, table names (checking the table names directly inside the query) as below.

val sqlParser = new TGSqlParser(EDbVendor.dbvsnowflake)
sqlParser.sqltext = "SELECT e.last_name AS name, e.commission_pct comm, e.salary * 12 \"Annual Salary\" FROM scott.employees AS e right join scott.companies as c on c.orgid = e.orgid and c.orgname = e.orgn WHERE e.salary > 1000 ORDER BY e.first_name, e.last_name"
val selectStmnt = sqlParser.sqltext
    println("Columns List:")
for(i <- 0 until selectStmnt.getResultColumnList.size()) {
  val resCol = selectStmnt.getResultColumnList.getResultColumn(i)
  println("Column: " + resCol.getExpr.toString + " Alias: " + resCol

.getAliasClause().toString)
    }

Output:

Columns List:
Column: e.last_name Alias: name
Column: e.commission_pct Alias: comm
Column: e.salary * 12 Alias: "Annual Salary"

I am trying to parse the join condition and get the details inside it

for(j <- 0 until selectStmnt.getJoins.size()) {
    println(selectStmnt.getJoins.getJoin(j).getTable)
}

The problem here is there is only one join condition in the query, so the size returned is 1. Hence the output is scott.employees. If I do it a bit different as below using getJoinItems

println("Parsing Join items")
for(j <- 0 until selectStmnt.getJoins.size()) {
    println(selectStmnt.getJoins.getJoin(j).getJoinItems)
}

I get the output by cutting off the first table from the join condition as below:

scott.companies as c on c.orgid = e.orgid and c.orgname = e.orgn

The method: getJoinItems() returns a list: TJoinItemList which I thought of traversing through. But even its size is 1.

println(selectStmnt.getJoins.getJoin(j).getJoinItems.size()) -> 1

I am out of ideas now. Could anyone let me know how can I parse the query's join condition and get the table names inside the join ?

1

There are 1 best solutions below

0
On

I don't have access to Snowflake dialect in GSP but I mimicked this scenario with Teradata dialect using the following query and created a sql parser.

SELECT e.last_name as name
FROM department d
RIGHT JOIN 
trimmed_employee e
ON d.dept_id = e.emp_id
WHERE e.salary > 1000
ORDER BY e.first_name

Here is the Groovy code of getting both the tables department, trimmed_employee. It boils down to iterating over each join and while doing so collect the current join's items (joinItems) using curJoin.joinItems only if it is not null.

stmt.joins.asList().collect { curJoin -> 
  [curJoin.table] + (curJoin?.joinItems?.asList()?.collect { joinItems ->  joinItems.table } ?: [])
}.flatten()

Result:

department
trimmed_employee

For this simple sql that you mentioned in my case, the following code also works.

stmt.tables.asList()