I've come across an odd behavior (originally in Snowflake, duplicated in DuckDB & Postgres 12) in SQL involving join optimization and aliasing. Here's some example code -
As seen above in the first SELECT statement, the filtering is done on "host" - my expected behavior here with the aliasing is to filter the righty table on "host", as a naming mismatch occurs between the aliasing and a real column name. In the example above, the query optimizer sees this and somehow decides that the join could be logically optimized to an inner join, even through there's another join clause that could result in unmatched rows from the left side. I would expect these unmatched rows to be returned, regardless of the aliasing semantics on the other clause's column names. Is this just an odd default behavior in SQL systems? I personally believe this should result in an "ambiguous column naming" compilation error that would protect the user from the query optimizer's non-deterministic join type choice.
If someone could explain that somehow above, I'd be very grateful. Thanks for reading!
create table lefty as select 'a' as h, '/a' as p;
insert into lefty values ('a', '/unmatched');
insert into lefty values ('b', '/diffhost');
create table righty as select 'a' as host, '/a' as path;
-- returns 1 row (/unmatched is not returned, INNER join used in query plan)
select h as host, p from lefty left outer join righty on (lefty.h = righty.host and lefty.p = righty.path) where host = 'a';
-- returns 2 rows (/unmatched IS returned, LOJ join used)
select h as host, p from lefty left outer join righty on (lefty.h = righty.host and lefty.p = righty.path) where h = 'a';
The
selectclause is evaluated after thewhereclause (there may be more operations in between e.g. group by, aggregates andhaving).The aliases assigned in the
selectclause are not visible inside thewhereclause so the two queries would be translated as:All RDBMS you mentioned are doing the right thing, nothing to do with optimization.