Query optimizers have odd behavior with ambiguous aliases, why?

87 Views Asked by At

I've come across an odd behavior (originally in Snowflake, duplicated in DuckDB & Postgres 12) in SQL involving join optimization and aliasing. Here's some example code -

As seen above in the first SELECT statement, the filtering is done on "host" - my expected behavior here with the aliasing is to filter the righty table on "host", as a naming mismatch occurs between the aliasing and a real column name. In the example above, the query optimizer sees this and somehow decides that the join could be logically optimized to an inner join, even through there's another join clause that could result in unmatched rows from the left side. I would expect these unmatched rows to be returned, regardless of the aliasing semantics on the other clause's column names. Is this just an odd default behavior in SQL systems? I personally believe this should result in an "ambiguous column naming" compilation error that would protect the user from the query optimizer's non-deterministic join type choice.

If someone could explain that somehow above, I'd be very grateful. Thanks for reading!

create table lefty as select 'a' as h, '/a' as p;
insert into lefty values ('a', '/unmatched');
insert into lefty values ('b', '/diffhost');
create table righty as select 'a' as host, '/a' as path;

-- returns 1 row (/unmatched is not returned, INNER join used in query plan)
select h as host, p from lefty left outer join righty on (lefty.h = righty.host and lefty.p = righty.path) where host = 'a';

-- returns 2 rows (/unmatched IS returned, LOJ join used)
select h as host, p from lefty left outer join righty on (lefty.h = righty.host and lefty.p = righty.path) where h = 'a';
1

There are 1 best solutions below

0
Salman A On

The select clause is evaluated after the where clause (there may be more operations in between e.g. group by, aggregates and having).

The aliases assigned in the select clause are not visible inside the where clause so the two queries would be translated as:

... where righty.host = 'a';
-- should return 1 row because where clause involves
-- right table which contains one "a" row

... where lefty.h = 'a';
-- should return 2 rows as expected

All RDBMS you mentioned are doing the right thing, nothing to do with optimization.