I'm new to postgres and I have a question about using partial index on frequently updated columns.
I have a huge table: job, which has the following columns. It has almost 50 millions rows.
Table and index
CREATE TABLE job
(
id uuid,
assigned_at timestamp with time zone,
completed_at timestamp with time zone
)
The assigned_at
column will be updated the the job was got by someone, and the completed_at
column will be updated when the job was completed. So the table will be updated frequently.
I've tried to create a partial index like follows:
CREATE INDEX idx ON job (c_id) WHERE ((assigned_at IS NOT NULL) AND (completed_at IS NULL));
Updating Query
Now I want to clear assignments for jobs that have been assigned for more than 10 days. Here is my query. It takes long long time to execute:
update table set assigned_at = null where completed_at is null and (now() - assigned_at) > INTERVAL '10 days'
The index works well on testing environment, but it is not used in the online environment. I wonder if the frequently operations on online environment prevent the use of partial index? And how to accelerate the updating query's speed?
If anyone has any thoughts on this that would be appreciated. Thanks.
Explain analysis:
- in testing env:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Update on job (cost=1592.47..1790362.54 rows=6083164 width=156) (actual time=31.447..31.448 rows=0 loops=1)
-> Bitmap Heap Scan on job (cost=1592.47..1790362.54 rows=6083164 width=156) (actual time=1.180..6.174 rows=494 loops=1)
Recheck Cond: ((assigned_at IS NOT NULL) AND (completed_at IS NULL))
Filter: (assigned_at < (now() - '10 days'::interval))
Rows Removed by Filter: 2585
Heap Blocks: exact=2698
-> Bitmap Index Scan on idx (cost=0.00..71.67 rows=7446475 width=0) (actual time=0.839..0.839 rows=3079 loops=1)
Planning Time: 0.238 ms
Execution Time: 31.487 ms
(9 rows)
- in online env:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Update on job (cost=0.00..1773961.63 rows=2275384 width=156) (actual time=56346.519..56346.521 rows=0 loops=1)
-> Seq Scan on job (cost=0.00..1773961.63 rows=2275384 width=156) (actual time=0.212..55583.427 rows=693 loops=1)
Filter: ((assigned_at IS NOT NULL) AND (completed_at IS NULL) AND ((now() - assigned_at) > '10 days'::interval))
Rows Removed by Filter: 47839353
Planning Time: 0.640 ms
Execution Time: 56346.582 ms
In your case index query probably may return too much rows (percent from all data, i.e low cardinality on not null), so optimizer do full scan instead index scan.