calcite elasticsearch adapter efficient join

356 Views Asked by At

What is the proper way to implement JOIN rewrite to allow query to be fed with results of subquery? For example:

SELECT state  FROM zips_view WHERE j IN (select j from people_view)

This query gets rewritten to SemiJoin which executes table scan inner query (select agains _people_view_ as expected) then again table scan for outer query (select against _zips_view_). The second scan can be replaced with filtered query e.g.

SELECT state FROM zips_iew WHERE j IN (1,2,3,4)

What's the proper way to implement a "two phase" JOIN which takes results of subquery and adds them to outer query as filter/condition?

1

There are 1 best solutions below

0
On

I'm using JDBC connector here, and for your desired query:

SELECT state FROM zips_iew WHERE j IN (1,2,3,4)

it generates this relational algebra:

LogicalProject(state=[$0])
  LogicalFilter(condition=[OR(=($0, 1), =($0, 2), =($0, 3), =($0, 4))])
    JdbcTableScan(table=[[zips_iew, state]])

You should start by writing rules that transform your original relational algebra (with SemiJoin) and work your way down to get the relational algebra that looks like above.