Can partitioning be used for LIKE queries in Postgres?

61 Views Asked by At

I have huge words table which I'm running LIKE query on:

create table words
(
  id   int,
  word varchar
)

It works pretty long. Index doesn't help a lot, so I'm trying to partition it by word column:

create table words
(
  id   int,
  word varchar
) partition by RANGE (word);

CREATE TABLE words_1 PARTITION OF words
  FOR VALUES FROM ('a') TO ('n');

CREATE TABLE words_2 PARTITION OF words
  FOR VALUES FROM ('n') TO ('z');

NOTE: Actually I'm planning to make 1 partition for each letter. Use only 2 of them for example simplicity.

So partitioning seems to work OK with equality and gt/lt operators:

explain
select * from words where word = 'abc'
Seq Scan on words_1 words  (cost=0.00..25.88 rows=6 width=36)
  Filter: ((word)::text = 'abc'::text)
explain
select * from words where word >= 'nth'
Seq Scan on words_2 words  (cost=0.00..25.88 rows=423 width=36)
  Filter: ((word)::text >= 'nth'::text)

But on LIKE queries it keeps scanning both partitions:

explain
select * from words where word LIKE 'abc%'
Append  (cost=0.00..51.81 rows=12 width=36)
  ->  Seq Scan on words_1  (cost=0.00..25.88 rows=6 width=36)
        Filter: ((word)::text ~~ 'abc'::text)
  ->  Seq Scan on words_2  (cost=0.00..25.88 rows=6 width=36)
        Filter: ((word)::text ~~ 'abc'::text)

Is there a way to make partitioning work on LIKE queries?

Maybe is there another way to achieve what i want?

2

There are 2 best solutions below

2
jjanes On BEST ANSWER

I'm a bit surprised it doesn't just work, at least in the C collation. But I can verify that it doesn't.

You could rewrite the query manually the same way word like 'abc%' sometimes gets rewritten:

explain analyze
select * from words where word >='abc' and word <'abd'

But this is only guaranteed to give the same answer in the C collation.

By the way, you should check partition pruning with EXPLAIN ANALYZE. It is possible for the partition pruning to only happen at run time, in which case all partitions still show up in the plan EXPLAIN plan. (But run-time pruning isn't the case here, I checked)

For non-C collation, you can use text_pattern_ops as Laurenz alludes to. As long as you spell it correctly, unlike my first attempt.

create table words2
(
  id   int,
  word varchar
) partition by RANGE (word text_pattern_ops);

But then you still need to rewrite the query in order to make the partition pruning happen.

explain analyze
select * from words2 where word like 'abc%' and word ~>=~ 'abc' and word  ~<~ 'abd';

You need both the LIKE and the inequality range, because the range alone might return false positives. (I don't know exactly when it might do that, but since the planner is worried about, I figure that worry is well-founded)

9
Schwern On

I cannot replicate your results. Your queries use an index, even with only a handful of rows. Make sure your tables have been analyzed and run your queries with explain analyze.


An improved table design would drop the id and use the word as a primary key, assuming they're unique. I've added about 1000 words and analyzed the table.

create table words ( word text primary key );

copy words(word) from '/Users/schwern/tmp/words.txt';

analyze words;

All of your queries do an index-only scan.

The default B-Tree index can do exact matches (words = 'this'), trailing wildcards (words like 'this%') and ordering. We can improve this further adding a Gist index using trigram ops.

create index word_gin_idx on words using gist(word gist_trgm_ops);

Now queries such as word like '%this%' will use the Gist index.