SQL correlated columns filter and sort

60 Views Asked by At

I have a table with many columns with different types, which I am using to filter and sort results (it is similar to faceted search with sorting).

Let's consider 2 columns: A and B. For each column an index is created.

We want to query A=value ORDER BY B LIMIT 10.

If the planner will use A index then it must scan all rows with A=value and sort them by B.

If the planner will use B index then it scan all rows until finds 10 rows with A=value.

We have 2500000 rows in total, 50000 rows with A=value, so there is one A=value per 50 rows, so it thinks to scan only 50*10=500 rows to find 10 A=value rows. However, it is wrong, because there is correlation between both fields and A=value rows are not evenly distributed, but they are at the end of B index, so it actually need to scan more than 2000000 rows.

If we disable B index then it is fine. However, we want B index for other queries.

How to solve the issue?

Creating (A,B ASC) index is not a good solution. Because there would be to many indexes and A might be of array type which doesn't support such index.

1

There are 1 best solutions below

1
jjanes On

There are no great general solutions for this. You can force it to not use the index on B for ordering on a case by case basis in SQL by adding some dummy operation to it, like:

ORDER BY B+0 LIMIT 10

or

ORDER BY B||'' LIMIT 10

There is always the possibility that some future version of PostgreSQL could "see through" these dummy operations and use that index anyway.

If the index on B is only desired to support equality comparisons, then you can make it a HASH index or a GIN index (via the btree_gin extension), which can't be used for ordering but can still be used for equality.