I have a read-only table with 80 million rows :
Column | Type | Modifiers | Storage | Stats target | Description
-------------+------------------------+-----------+----------+--------------+-------------
id | character(11) | not null | extended | |
gender | character(1) | | extended | |
postal_code | character varying(10) | | extended | |
operator | character varying(5) | | extended | |
Indexes:
"categorised_phones_pkey" PRIMARY KEY, btree (id)
"operator_idx" btree (operator)
"postal_code_trgm_idx" gin (postal_code gin_trgm_ops)
id is Primary Key and contains unique mobile numbers. Table rows looks like this:
id | gender | postal_code | operator
----------------+--------------+----------------+------------
09567849087 | m | 7414776788 | mtn
09565649846 | f | 1268398732 | mci
09568831245 | f | 7412556443 | mtn
09469774390 | m | 5488312790 | mci
This query takes almost ~65 seconds for the first time and ~8 seconds for next times:
select operator,count(*) from categorised_phones where postal_code like '1%' group by operator;
And the output looks like this:
operator | count
----------+---------
mci | 4050314
mtn | 6235778
And the output of explain alanyze :
HashAggregate (cost=1364980.61..1364980.63 rows=2 width=10) (actual time=8257.026..8257.026 rows=2 loops=1)
Group Key: operator
-> Bitmap Heap Scan on categorised_phones (cost=95969.17..1312915.34 rows=10413054 width=2) (actual time=1140.803..6332.534 rows=10286092 loops=1)
Recheck Cond: ((postal_code)::text ~~ '1%'::text)
Rows Removed by Index Recheck: 25105697
Heap Blocks: exact=50449 lossy=237243
-> Bitmap Index Scan on postal_code_trgm_idx (cost=0.00..93365.90 rows=10413054 width=0) (actual time=1129.270..1129.270 rows=10287127 loops=1)
Index Cond: ((postal_code)::text ~~ '1%'::text)
Planning time: 0.540 ms
Execution time: 8257.392 ms
How can I make this query faster?
Any idea would be great appreciated.
P.S:
I'm using PostgreSQL 9.6.1
UPDATE
I just updated the question. I disabled Parallel Query and results changed.
For queries that involve comparisons of the form
LIKE '%start', and following PostgreSQL own advice, you can use the following index:With that index in place, and some simulated data, your execution plan could very likely look like:
You can check it at dbfiddle here
If you have both queries with
LIKE 'start%'andLIKE '%middle%', you should add this index, but keep the one already in place. Trigram indexes might prove useful with this second kind of match.Why?
From PostgreSQL documentation on operator classes:
From PostgreSQL documentation on Index Types
UPDATE
If the queries performed involved always a fix (and relatively small) number of
LIKE 'x%'expressions, consider usingpartial indexes.For instance, for
LIKE '1%', you'd have the following index, and the following query plan (it shows about a 3x improvement):