BTREE vs GIN vs GIST index

8k Views Asked by At


I'm using Postgres DB and I have a table called MyObjects with several varchar columns. The values will not be separated words (for example, urls, names etc..). I will be filtering according to these columns a lot. Searching rows that a certain word will correspond in the beginning, middle or ending of the value. The table itself will not be large (the number of rows in the beginning will be around n*10^3 but it will grow to be around n*10^5 rows) Is there even any need for an index in this case? If so - what kind of index do you suggest?
Thanks!

2

There are 2 best solutions below

0
On

If you are looking for an index that can help with search patterns as WHERE col LIKE '%string%', the only option is a GIN or GiST index using pg_tgrm.

These indexes can become quite large, and they don't help a lot if you search for short substrings.

To figure out if such an index would bring benefits over a sequential table scan in your case, you'd have to try it out – it cannot be answered in general.

0
On

From PostgreSQL documentation:

11.2. Index Types

  • By default, the CREATE INDEX command creates B-tree indexes
  • The other index types are selected by writing the keyword USING followed by the index type name
  • B-trees can handle equality and range queries on data that can be sorted (< <= = >= > BETWEEN, IN)
  • The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'.

So, BTree is out, because you want to match to middle and end. Now, looking at this page:

PostgreSQL: More performance for LIKE and ILIKE statements

It seems Gin is the way to go. You should activate pg_trgm as mentioned by @Laurenz Albe:

CREATE EXTENSION pg_trgm;

The query should look something like this:

CREATE INDEX idx_gin ON MyObjects USING gin (column_name gin_trgm_ops);

Hope this helps. Please read the cybertec post thoroughly for detailed info.

To answer your other question, (Is there even need for an index) my observation is that if you grow to 10^5, yes, you might very well need an index. It's not that 10^5 is too large, it's that you might want to do many searches on that dataset. I've had a 180000 row table (Pretty small right?) but, for one process I was searching 20000 records in there. With a simple BTree, I got the query exec time from 271 seconds down to 343 ms. This was on SQLite. So, yeah, I'd go with an index.