postgres trigram index is too slow with ilike search

1.2k Views Asked by At

I'm doing a pattern matching search with ILIKE in our system, but it gets too slow with some tables due to the amount of records in the table. So I'm implementing trigram index following instructions in this post https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/. I'm not using full text search because I need searchs like '%xxx%' and full text search does not work well with that. The test table has 16000 records and I have created a new column in the table for the search concatenating some other columns.

I've run some test and this are de results:

SELECT * FROM "table" WHERE "searchField" ILIKE '%ose%'

NO INDEX 1639 rows 30.3 sec. avg.

GIN INDEX 1639 rows 26.4 sec. avg.

SELECT * FROM "table" WHERE "searchField" ILIKE '%ose%' OR "searchField" ILIKE '%ria%'

NO INDEX 1639 rows 3:02 min. avg.

GIN INDEX 1639 rows 2.56 min. avg.

As you can see it's not a great inprovement, the post said that query time reduce to miliseconds. The explain analyze shows this:

Bitmap Heap Scan on "table"  (cost=22.31..1827.93 rows=1331 width=511)
                             (actual time=0.354..4.644 rows=1639 loops=1)
  Recheck Cond: (("searchField")::text ~~* '%ose%'::text)
  Heap Blocks: exact=585
  ->  Bitmap Index Scan on "table_trgm_gin"  (cost=0.00..21.98 rows=1331 width=0)
                                             (actual time=0.276..0.276 rows=1639 loops=1)
        Index Cond: (("searchField")::text ~~* '%ose%'::text)

The index scan is fast but the condition recheck is too slow. I have read that rechecking is unavoidable due to false positives posibilities. But then I don´t know how to get better results.

Can anyone explain why the index does not make much of a difference and how to get better query times?

1

There are 1 best solutions below

2
On

The EXPLAIN (ANALYZE) you show must be from a different table, because there the duration is under 5 milliseconds.

I notice that the patterns you search for are very short (3 characters).

Trigram indexes don't perform good for short patterns, because many rows will match during the index scan and all of these rows have to be rechecked.

Two things to check to see if my analysis is correct:

  • Test with longer patterns and see if the performance improves.
  • Look at the EXPLAIN (ANALYZE) output of the query that takes three minutes and see if a lot of rows are found during the index scan.

If I am right, there is not much you can do. Looking for short patterns just doesn't perform very well. You could try to limit the minimum length of characters in a pattern to avoid the problem.