I have this table in a Postgresql database:
CREATE TABLE games (
uuid UUID PRIMARY KEY,
-- ... more data
pgn ltree
);
CREATE INDEX pgn_gist_index on games USING GIST (pgn);
CREATE INDEX pgn_index ON games USING BTREE (pgn);
The table contains about 100M rows and the pgn column uses the ltree type from the ltree extension.
Now, from what I read on various posts on the internet and the ltree doc, a btree index (default on postgresql) is not ideal for ltree data so it is recommended to use gist. The issue is my performance are much worse with a gist index vs a btree index and I don't know why. When I use both indexes, postgresql incorrectly assumes the gist index is going to have better performance so the performance is still worse.
Running EXPLAIN SELECT COUNT(*) FROM games WHERE pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z';
With gist index:
Aggregate (cost=460.54..460.55 rows=1 width=8)
-> Bitmap Heap Scan on games (cost=61.20..460.28 rows=102 width=0)
Recheck Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
-> Bitmap Index Scan on pgn_gist_index (cost=0.00..61.17 rows=102 width=0)
Index Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
with btree index:
Finalize Aggregate (cost=98223.46..98223.47 rows=1 width=8)
-> Gather (cost=98223.25..98223.46 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=97223.25..97223.26 rows=1 width=8)
-> Parallel Seq Scan on games (cost=0.00..97223.14 rows=42 width=0)
Filter: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
As you can see, the expected cost with a gist index is MUCH lower than with a btree index (as I would expect). But then, when actually running the query, the performance are about 3 times worse:
Running EXPLAIN ANALYZE SELECT COUNT(*) FROM games WHERE pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z
with gist index:
Aggregate (cost=460.54..460.55 rows=1 width=8) (actual time=1305.588..1305.589 rows=1 loops=1)
-> Bitmap Heap Scan on games (cost=61.20..460.28 rows=102 width=0) (actual time=1305.529..1305.581 rows=67 loops=1)
Recheck Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
Heap Blocks: exact=67
-> Bitmap Index Scan on pgn_gist_index (cost=0.00..61.17 rows=102 width=0) (actual time=1305.469..1305.469 rows=67 loops=1)
Index Cond: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
Planning Time: 0.473 ms
Execution Time: 1305.880 ms
with btree index:
Finalize Aggregate (cost=98223.46..98223.47 rows=1 width=8) (actual time=411.258..413.445 rows=1 loops=1)
-> Gather (cost=98223.25..98223.46 rows=2 width=8) (actual time=411.101..413.439 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=97223.25..97223.26 rows=1 width=8) (actual time=401.035..401.035 rows=1 loops=3)
-> Parallel Seq Scan on games (cost=0.00..97223.14 rows=42 width=0) (actual time=14.101..401.011 rows=22 loops=3)
Filter: (pgn ~ '*.Qg8y.Rxg8|Nxg8.Nf7z'::lquery)
Rows Removed by Filter: 339978
Planning Time: 0.339 ms
Execution Time: 413.519 ms
Note: I ran ANALYZE games; after modifying the indexes. What could explain the difference between the expected cost and the actual performance?