Postgres picks sequential scan instead of a much faster index scan on multiple INNER JOINs

283 Views Asked by At

We have the following query:

SELECT ...
FROM Client c
INNER JOIN Address a1 ON c.Address1 = a1.Id
INNER JOIN Address a2 ON c.Address2 = a2.Id
INNER JOIN Address a3 ON c.Address3 = a3.Id
WHERE c.Status = 0
AND a1.Status = 0
AND a2.Status = 0
AND a3.Status = 0
AND c.Id = 'specific id'

As you can see, we join several times to the same table, using different foreign keys. All of the tables have a Status column, and we filter Status = 0 for each table.

When analyzing this with EXPLAIN ANALYZE, one can see that the query planner chooses an index scan for all of the tables, except for one of the Address tables (the a2 joined by c.Address2). This single table is scanned with a sequential scan for Status = 0 and later joined with c.Address2 = a2.Id.

This strategy is much more costly than an index scan. We select just one Client row, so the whole query returns just one line, so the c.Address2 = a2.Id predicate is much more limiting than the a2.Status = 0 one.

OK so I thought maybe there is something special with this foreign key, which causes the statistics to be misleading to the query optimizer. So I removed the join to a2, which left us with the following query:

SELECT ...
FROM Client c
INNER JOIN Address a1 ON c.Address1 = a1.Id
INNER JOIN Address a3 ON c.Address3 = a3.Id
WHERE c.Status = 0
AND a1.Status = 0
AND a3.Status = 0
AND c.Id = 'specific id'

But now the query optimizer uses a sequential scan on a3. a1 is still scanned using the index.

I always thought the query optimizer uses the table statistics (Address in this case) together with the limiting query (aX.Status = 0 and c.AddressX = aX.Id) to decide which plan to use. But here it's more than that. When I include the join to a2, it uses a sequential scan for a2 and an index scan for the rest. When I remove the join to a2, it uses a sequential scan for a3. So the chosen strategy for a2 depends on whether the unrelated join to a3 is present or not.

What could be the reason for this behavior?

Please note that statistics is up to date and the tables have all been vacuuumed.

0

There are 0 best solutions below