I've read "or" predicates can cause poor SQL server performance, can demorgans save me?

129 Views Asked by At

So I've heard/read before that SQL server can get confused with or predicates, generally when an or is provided sql server will actually have run the query twice (one with each side of the or predicate) and concatenate/intersect the result sets -- running the query twice is less than ideal. So in the below query

SELECT *
FROM foo_table
WHERE bar <> 5 OR baz <> 8

Would actually be evaluated like

SELECT *
FROM foo_table
WHERE bar <> 5
INTERSECT
SELECT *
FROM foo_table
WHERE baz <> 8

Assuming this is correct (frame challenges are welcome), I'm wondering if I couldn't apply demorgan's here to turn the `or` condition into an `and` condition in order to get a single evaluation, and if it would perform better or if there's something I'm missing here.
SELECT *
FROM foo_table
WHERE NOT (bar = 5 and baz = 8)
1

There are 1 best solutions below

0
On

generally when an or is provided sql server will actually have run the query twice (one with each side of the or predicate) and concatenate/intersect the result sets

If you had to find the lines in a file which did not match two conditions, would you write it like that?

The exact details will depend on the database, but SQL databases are generally row oriented, and they will be able to optimize such a simple query.

It would do a single loop through all the rows (a full table scan) checking each row for both conditions once.

We can check with an explain. Here it is on Postgres 14.

explain SELECT *
FROM test
WHERE bar <> 5 OR baz <> 8

QUERY PLAN
Seq Scan on test  (cost=0.00..19425.00 rows=999975 width=8)
  Filter: ((bar <> 5) OR (baz <> 8))

Here's what an intersect looks like. It does scan the table twice and merge.

explain SELECT *
FROM test
WHERE bar <> 5
INTERSECT
SELECT *
FROM test
WHERE baz <> 8

HashSetOp Intersect  (cost=0.00..73650.00 rows=40000 width=12)
  ->  Append  (cost=0.00..63700.00 rows=1990000 width=12)
        ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..26875.00 rows=995000 width=12)
              ->  Seq Scan on test  (cost=0.00..16925.00 rows=995000 width=8)
                    Filter: (bar <> 5)
        ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..26875.00 rows=995000 width=12)
              ->  Seq Scan on test test_1  (cost=0.00..16925.00 rows=995000 width=8)
                    Filter: (baz <> 8)

can demorgans save me?

Yes, but the optimizer is already on it.

explain SELECT *
FROM test
WHERE NOT (bar = 5 and baz = 8)

QUERY PLAN
Seq Scan on test  (cost=0.00..19425.00 rows=999975 width=8)
  Filter: ((bar <> 5) OR (baz <> 8))

The Postgres optimizer has undone your optimization attempt. :)

Demonstration


The performance problem is an index is of no help in this sort of query. If you ask for "all the rows except this small subset" that's going to return most of the rows, so it will just scan the whole table.