Why the lazy join type in many recursive queries?

27 Views Asked by At

I will often see a normal JOIN written as a CROSS JOIN with a WHERE clause tacked onto the end, especially when going through WITH RECURSIVE documentation. For example:

a,b WHERE   (a.id=b.parent_id) -- same as: a CROSS JOIN b WHERE a.id=b.parent_id
a JOIN b ON (a.id=b.parent_id)

Is there ever any reason to do this? My thinking is that it's possible that the query optimizer doesn't convert the cross join to the first version, so it might be less efficient.

For an actual example, here would be one:

CREATE TABLE body AS (
    SELECT 1 id, 'body' AS name, NULL parent_id UNION ALL
    SELECT 2, 'head', 1 UNION ALL
    SELECT 3, 'eyes', 2 UNION ALL
    SELECT 4, 'pupils', 3 UNION ALL
    SELECT 5, 'torso', 1 UNION ALL
    SELECT 6, 'arms', 5
);

-- using JOIN with condition
WITH RECURSIVE body_paths (id, part, path) AS (
    SELECT id, name, name AS path FROM body WHERE parent_id IS NULL -- VALUES (1, "body", "body")
    UNION ALL
    SELECT body.id, body.name, CONCAT(path, '.', body.name) 
      FROM body JOIN body_paths ON (body.parent_id=body_paths.id)
) SELECT * FROM body_paths WHERE part in ('pupils', 'arms');

-- using CROSS JOIN + WHERE
WITH RECURSIVE body_paths (id, part, path) AS (
    SELECT id, name, name AS path FROM body WHERE parent_id IS NULL -- VALUES (1, "body", "body")
    UNION ALL
    SELECT body.id, body.name, CONCAT(path, '.', body.name) 
      FROM body, body_paths WHERE (body.parent_id=body_paths.id)
) SELECT * FROM body_paths WHERE part in ('pupils', 'arms');

-- both produce
┌────┬────────┬───────────────────────┐
│ id ┆ part   ┆ path                  │
╞════╪════════╪═══════════════════════╡
│  6 ┆ arms   ┆ body.torso.arms       │
│  4 ┆ pupils ┆ body.head.eyes.pupils │
└────┴────────┴───────────────────────┘

Now on a CTE with less than ten rows it doesn't matter about optimizations, but in the general case, should the non-cross join be preferred or why would it ever be written the other way?

1

There are 1 best solutions below

1
GalacticCowboy On

With the where clause, the comma join is equivalent to an inner join. (Technically speaking, a cross join with a where clause is also equivalent to an inner join - only matching records from each table will be output, not all records in both tables.)

In terms of performance, there should be no difference as the optimizer will figure out the best query plan for it. However, in terms of maintenance and readability, there is - comma joins are harder to read and understand, especially as you add more tables to the query, and as your question indicates, their behavior is also not well understood. It's also harder to correlate which clauses affect which joins.

Comma joins are a much older syntax that has largely fallen out of favor, for good reason.