I have this table in Postgres:
CREATE TABLE public.products
(
id character varying NOT NULL,
created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
name character varying NOT NULL,
);
As id I'm using ULID (note the column type in Postgres is varchar with no fixed length).
I'm ready to change the column type if needed.
SELECT "products".*
FROM "products"
WHERE "id" = ANY('{01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11PB3N1Q9TXME6KW1B,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0WJ53MH0WKNR65CBX1,01HQDMCF0ZV7TYZWRR6RN5V4QT,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YCJ8EFYCBKHWN4VQN,01HQDMCF0Z46AY42FC8FR953D8,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0ZBQ3A70GE8K41RW1V,01HQDMCF0WNYW4ZH5G0M8MDAQA,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0W24YQHH4EKN91S0JY,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0YEQR7NJJJW88XNTW9,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF106M1667NNPTDBKQDB,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0ZMPTQ0GHG0V87W0J4,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WHNJ0P9BGVEFJE2JG,01HQDMCF109V71KW4MTFWVRTFR,and a lot more ~ 20k of them}')
LIMIT 37520 OFFSET 0
which is slow.
IMPORTANT
I'm using a lot (~ 20k) id values in the ANY() clause. And I think this is the problem.
If I use EXPLAIN ANALYZE it says:
Limit (cost=85.51..90.82 rows=154 width=172) (actual time=1.765..1.799 rows=138 loops=1)
-> Seq Scan on products (cost=85.51..90.82 rows=154 width=172) (actual time=1.764..1.791 rows=138 loops=1)
Filter: ((id)::text = ANY ('{01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11PB3N1Q9TXME6KW1B,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0WJ53MH0WKNR65CBX1,01HQDMCF0ZV7TYZWRR6RN5V4QT,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YCJ8EFYCBKHWN4VQN,01HQDMCF0Z46AY42FC8FR953D8,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0ZBQ3A70GE8K41RW1V,01HQDMCF0WNYW4ZH5G0M8MDAQA,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0W24YQHH4EKN91S0JY,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0YEQR7NJJJW88XNTW9,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF106M1667NNPTDBKQDB,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0ZMPTQ0GHG0V87W0J4,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WHNJ0P9BGVEFJE2JG,01HQDMCF109V71KW4MTFWVRTFR,and a lot more ~ 20k of them}'::text[]))
Rows Removed by Filter: 16
Planning Time: 9.769 ms
Execution Time: 1.878 ms
I tried:
CREATE INDEX product_id_idx ON products(id);
and
CREATE INDEX product_id_idx ON products USING HASH(id);
and
CREATE INDEX product_id_pattern_idx ON products USING btree (id text_pattern_ops);
But they don't fix the extreme slowness.
What index can I create to improve the query?
I didn't go all the way up to 20k, but
=ANY(ARRAY[])does seem reluctant to use the indexes you made available, sequentially scanning a freshlyvacuum analyze'd table with 200k rows. Demo1:It's resolved if you trade that for equivalent
IN (SELECT UNNEST(ARRAY[])). Demo2:Now using the index, it goes down from
17sto1ms.Turntables: I'll leave the above in for context but unless you're generating your ids similarly, it shouldn't apply. There still might be some advantage of
INoverANY, but your index should already work as it is.ANY: demo3IN: demo4