Conditions in LEFT JOIN (OUTER JOIN) vs INNER JOIN

2.2k Views Asked by At
SELECT A.COL1, B.COL1,C.COL1
FROM TABLEA A
LEFT JOIN TABLEB B ON A.COL1 = B.COL1
LEFT JOIN TABLEC C ON (
        C.COL3 IS NOT NULL
        AND (
              C.COL2 = 664
              AND A.COL1 = C.COL1
        )
)

In regards to technicalities of SQL, what does the condition written in parentheses after LEFT JOIN TABLE C ON mean? Why are those necessary?

3

There are 3 best solutions below

0
On

An inner join (JOIN or INNER JOIN, CROSS JOIN or comma) first does a CROSS JOIN. (Ie returns all rows that can be made by appending a row from its left table and a row from its right table.) Then any ON removes rows that don't meet its condition. An OUTER JOIN returns the rows of a corresponding INNER JOIN plus, for the left table (LEFT) or right table (RIGHT) or both tables (FULL), any unmatched rows extended with NULLs. After the FROM a WHERE removes rows that don't meet its condition.

If a condition is in an ON then matching rows are removed in the FROM. But if that condition is instead in a WHERE then matching rows and any rows incorporating them via later joins still get removed. So if a FROM only has inner joins then it doesn't matter whether a condition is in an ON or a WHERE.

But if a FROM has an OUTER JOIN ON a condition then cross join rows not meeting the condition are removed and certain NULL-extended rows are added whereas moving that condition to a WHERE does the removal but not the addition.

It's not necessary for the language to have ON for INNER JOIN since instead of t1 INNER JOIN t2 ON condition one could involve (SELECT * FROM t1 INNER JOIN t2 WHERE condition) instead.

From the above you can work out the following: For a sequence of INNER JOINS after any last OUTER JOIN (including when there are no OUTER JOINs) one can freely move conditions between their ONs and a WHERE. But not for ONs of or before any last OUTER JOIN because they can affect its inputs and so affect what NULLed rows are output. There's just no reason to expect the same result if such a condition were moved from the ON to a WHERE.

For your code: Likely the query is designed to return A.COL1 as an id with associated A, B and C info, with those that don't have B info nevertheless included (with B & C info NULLed) and those that do but don't have C info or do but don't have non-NULL C.COL3 or do but don't have C.COL2=664 nevertheless included (with C info NULLed).

0
On

With an OUTER JOIN your C may be NULL because there is no match for A.COL1.

If that's the case then there can never be C.COL2 = 664 meaning any WHERE statement would throw away the row.

By putting the condition into the ON clause you are asking that to be applied to the JOIN condition, not as a filter on the result.

For non-outer joins it makes no difference.

1
On

Earlier versions of ANSI SQL did not contain the ON clause for join conditions – it used the where clause for everything. This was fine for inner joins, but as database applications started using outer joins, there were problems that arose with this approach. Some of you may remember the original ANSI-89-era syntax for *= and =*. These were used on the predicates to define the behavior for an outer join. In this case, we’ll preserve non-matching rows from A in addition to the normal rows returned from a join:

SELECT A.COL1, B.COL1,C.COL1
FROM TABLEA A
LEFT JOIN TABLEB B ON A.COL1 = B.COL1
LEFT JOIN TABLEC C ON (
        C.COL3 IS NOT NULL
        AND (
              C.COL2 = 664
              AND A.COL1 = C.COL1
        )
)

This is very helpful when you are trying to figure out what should happen in a query. It helps semantically define the set of rows that should return from the join.

The ON clause: This syntax allows you to specify the column names for join keys in both tables The ON clause is used to join tables where the column names don’t match in both tables. The join conditions are removed from the filter conditions in the WHERE clause.