postgres: partial Index on frequently update column

484 Views Asked by At

I'm new to postgres and I have a question about using partial index on frequently updated columns.

I have a huge table: job, which has the following columns. It has almost 50 millions rows.

Table and index

CREATE TABLE job
(
  id uuid,
  assigned_at timestamp with time zone,
  completed_at timestamp with time zone
)

The assigned_at column will be updated the the job was got by someone, and the completed_at column will be updated when the job was completed. So the table will be updated frequently.

I've tried to create a partial index like follows:

CREATE INDEX idx ON job (c_id) WHERE ((assigned_at IS NOT NULL) AND (completed_at IS NULL));

Updating Query

Now I want to clear assignments for jobs that have been assigned for more than 10 days. Here is my query. It takes long long time to execute:

update table set assigned_at = null where completed_at is null and (now() - assigned_at) > INTERVAL '10 days'

The index works well on testing environment, but it is not used in the online environment. I wonder if the frequently operations on online environment prevent the use of partial index? And how to accelerate the updating query's speed?

If anyone has any thoughts on this that would be appreciated. Thanks.

Explain analysis:

  1. in testing env:
                                                       QUERY PLAN         
-------------------------------------------------------------------------------------------------------------------------------------------------
 Update on job (cost=1592.47..1790362.54 rows=6083164 width=156) (actual time=31.447..31.448 rows=0 loops=1)
  -> Bitmap Heap Scan on job (cost=1592.47..1790362.54 rows=6083164 width=156) (actual time=1.180..6.174 rows=494 loops=1)
        Recheck Cond: ((assigned_at IS NOT NULL) AND (completed_at IS NULL))
        Filter: (assigned_at < (now() - '10 days'::interval))
        Rows Removed by Filter: 2585
        Heap Blocks: exact=2698
        -> Bitmap Index Scan on idx (cost=0.00..71.67 rows=7446475 width=0) (actual time=0.839..0.839 rows=3079 loops=1)
 Planning Time: 0.238 ms
 Execution Time: 31.487 ms
(9 rows)
  1. in online env:
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Update on job  (cost=0.00..1773961.63 rows=2275384 width=156) (actual time=56346.519..56346.521 rows=0 loops=1)
   ->  Seq Scan on job  (cost=0.00..1773961.63 rows=2275384 width=156) (actual time=0.212..55583.427 rows=693 loops=1)
         Filter: ((assigned_at IS NOT NULL) AND (completed_at IS NULL) AND ((now() - assigned_at) > '10 days'::interval))
         Rows Removed by Filter: 47839353
 Planning Time: 0.640 ms
 Execution Time: 56346.582 ms
2

There are 2 best solutions below

0
On

In your case index query probably may return too much rows (percent from all data, i.e low cardinality on not null), so optimizer do full scan instead index scan.

4
On

Presumably you would want to use the index to stop after 10 days too, not just to apply the NULL criteria. To do that, you would need to write the query with the indexable column on its own, as PostgreSQL won't do the algebra for you:

update job set assigned_at = null where completed_at is null and assigned_at < now() - INTERVAL '10 days'

For which you would want this index:

CREATE INDEX idx ON job (assigned_at) WHERE ((assigned_at IS NOT NULL) AND (completed_at IS NULL));

but it is not used in the online environment.

What did it do instead? Give the EXPLAIN (ANALYZE, BUFFERS), please (which will actually run the UPDATE).