So I've heard/read before that SQL server can get confused with or
predicates, generally when an or
is provided sql server will actually have run the query twice (one with each side of the or
predicate) and concatenate/intersect the result sets -- running the query twice is less than ideal. So in the below query
SELECT *
FROM foo_table
WHERE bar <> 5 OR baz <> 8
Would actually be evaluated like
SELECT *
FROM foo_table
WHERE bar <> 5
INTERSECT
SELECT *
FROM foo_table
WHERE baz <> 8
Assuming this is correct (frame challenges are welcome), I'm wondering if I couldn't apply demorgan's here to turn the `or` condition into an `and` condition in order to get a single evaluation, and if it would perform better or if there's something I'm missing here.
SELECT *
FROM foo_table
WHERE NOT (bar = 5 and baz = 8)
If you had to find the lines in a file which did not match two conditions, would you write it like that?
The exact details will depend on the database, but SQL databases are generally row oriented, and they will be able to optimize such a simple query.
It would do a single loop through all the rows (a full table scan) checking each row for both conditions once.
We can check with an
explain
. Here it is on Postgres 14.Here's what an intersect looks like. It does scan the table twice and merge.
Yes, but the optimizer is already on it.
The Postgres optimizer has undone your optimization attempt. :)
Demonstration
The performance problem is an index is of no help in this sort of query. If you ask for "all the rows except this small subset" that's going to return most of the rows, so it will just scan the whole table.