Postgresql Ltree performance much worse with GisT index

270 Views Asked by At

I have this table in a Postgresql database:

CREATE TABLE games (
    uuid UUID PRIMARY KEY,
--  ... more data
    pgn ltree
);
CREATE INDEX pgn_gist_index on games USING GIST (pgn);
CREATE INDEX pgn_index ON games USING BTREE (pgn);

The table contains about 100M rows and the pgn column uses the ltree type from the ltree extension.

Now, from what I read on various posts on the internet and the ltree doc, a btree index (default on postgresql) is not ideal for ltree data so it is recommended to use gist. The issue is my performance are much worse with a gist index vs a btree index and I don't know why. When I use both indexes, postgresql incorrectly assumes the gist index is going to have better performance so the performance is still worse.

Running EXPLAIN SELECT COUNT(*) FROM games WHERE pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z';

With gist index:

 Aggregate  (cost=460.54..460.55 rows=1 width=8)
   ->  Bitmap Heap Scan on games  (cost=61.20..460.28 rows=102 width=0)
         Recheck Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
         ->  Bitmap Index Scan on pgn_gist_index  (cost=0.00..61.17 rows=102 width=0)
               Index Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)

with btree index:

 Finalize Aggregate  (cost=98223.46..98223.47 rows=1 width=8)
   ->  Gather  (cost=98223.25..98223.46 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=97223.25..97223.26 rows=1 width=8)
               ->  Parallel Seq Scan on games  (cost=0.00..97223.14 rows=42 width=0)
                     Filter: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)

As you can see, the expected cost with a gist index is MUCH lower than with a btree index (as I would expect). But then, when actually running the query, the performance are about 3 times worse:

Running EXPLAIN ANALYZE SELECT COUNT(*) FROM games WHERE pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z

with gist index:

 Aggregate  (cost=460.54..460.55 rows=1 width=8) (actual time=1305.588..1305.589 rows=1 loops=1)
   ->  Bitmap Heap Scan on games  (cost=61.20..460.28 rows=102 width=0) (actual time=1305.529..1305.581 rows=67 loops=1)
         Recheck Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
         Heap Blocks: exact=67
         ->  Bitmap Index Scan on pgn_gist_index  (cost=0.00..61.17 rows=102 width=0) (actual time=1305.469..1305.469 rows=67 loops=1)
               Index Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
 Planning Time: 0.473 ms
 Execution Time: 1305.880 ms

with btree index:

 Finalize Aggregate  (cost=98223.46..98223.47 rows=1 width=8) (actual time=411.258..413.445 rows=1 loops=1)
   ->  Gather  (cost=98223.25..98223.46 rows=2 width=8) (actual time=411.101..413.439 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=97223.25..97223.26 rows=1 width=8) (actual time=401.035..401.035 rows=1 loops=3)
               ->  Parallel Seq Scan on games  (cost=0.00..97223.14 rows=42 width=0) (actual time=14.101..401.011 rows=22 loops=3)
                     Filter: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
                     Rows Removed by Filter: 339978
 Planning Time: 0.339 ms
 Execution Time: 413.519 ms

Note: I ran ANALYZE games; after modifying the indexes. What could explain the difference between the expected cost and the actual performance?

0

There are 0 best solutions below