Oracle semi-join with multiple tables in SQL subquery

934 Views Asked by At

This question is how to work around the apparent oracle limitation on semi-joins with multiple tables in the subquery. I have the following 2 UPDATE statements.

Update 1:

UPDATE
     (SELECT a.flag update_column
      FROM a, b
      WHERE a.id = b.id AND
            EXISTS (SELECT NULL
                    FROM c
                    WHERE c.id2 = b.id2 AND
                          c.time BETWEEN start_in AND end_in) AND
            EXISTS (SELECT NULL
                    FROM TABLE(update_in) d
                    WHERE b.time BETWEEN d.start_time AND d.end_time))
SET update_column = 'F'

The execution plan indicayes that this correctly performs 2 semi-joins, and the update executes in seconds. These need to be semi-joins because c.id2 is not a unique foreign key on b.id2, unlike b.id and a.id. And update_in doesn't have any constraints at all since it's an array.

Update 2:

UPDATE
     (SELECT a.flag update_column
      FROM a, b
      WHERE a.id = b.id AND
            EXISTS (SELECT NULL
                    FROM c, TABLE(update_in) d
                    WHERE c.id2 = b.id2 AND
                          c.time > d.time AND
                          b.time BETWEEN d.start_time AND d.end_time))
SET update_column = 'F'

This does not do a semi-join; I believe based on the Oracle documentation that's because the EXISTS subquery has 2 tables in it. Due to the sizes of the tables, and partitioning, this update takes hours. However, there is no way to relate d.time to the associated d.start_time and d.end_time other than being on the same row. And the reason we pass in the update_in array and join it here is because running this query in a loop for each time/start_time/end_time combination also proved to give poor performance.

Is there a reason other than the 2 tables that the semi-join could be not working? If not, is there a way around this limitation? Some simple solution I am missing that could make these criteria work without putting 2 tables in the subquery?

1

There are 1 best solutions below

1
On

As Bob suggests you can use a Global Temporary Table (GTT) with the same structure as your update_in array, but the key difference is that you can create indexes on the GTT, and if you populate the GTT with representative sample data, you can also collect statistics on the table so the SQL query analyzer is better able to predict an optimal query plan.

That said there are also some other notable differences in your two queries:

  • In the first exists clause of your first query you refer to two columns start_in and end_in that don't have table references. My guess is that they are either columns in table a or b, or they are variables within the current scope of your sql statement. It's not clear which.
  • In your second query you refer to column d.time, however, you don't use that column in the first query.

Does updating your second query to the following improve it's performance?

UPDATE
     (SELECT a.flag update_column
      FROM a, b
      WHERE a.id = b.id AND
            EXISTS (SELECT NULL
                    FROM c, TABLE(update_in) d
                    WHERE c.id2 = b.id2 AND
                          c.time BETWEEN start_in AND end_in AND
                          c.time > d.time AND
                          b.time BETWEEN d.start_time AND d.end_time))
SET update_column = 'F'