PostgreSQL update slow

307 Views Asked by At

My update query takes days to run, and seems to get longer exponentially in production; I'm hoping that it can run faster. First, my query looks like this:

UPDATE table_1 a
SET boolean_column = TRUE
WHERE a.boolean_column = FALSE
AND EXISTS(SELECT b.joining_key
         FROM table_2 b
         WHERE a.primary_key = b.joining_key
           AND b.category = 'abc');

execution plan:

Update on table_1 a (cost=0.00..1137712.90 rows=1 width=136)
-> Nested Loop Semi Join (cost=0.00..1137712.90 rows=1 width=136)
Join Filter: (a.primary_key = (b.joining_key)::text)
-> Seq Scan on table_1 a (cost=0.00..36152.13 rows=1 width=129)
Filter: (NOT boolean_column)
-> Seq Scan on table_2 b (cost=0.00..1091459.31 rows=808117 width=33)
Filter: ((category)::text = 'abc'::text)

table_1 has ~450K rows, and table_2 has ~4.5 million rows, ~800K of which are category = 'abc'

Strange thing is, when I run the same query for different category, it only takes ~15 minutes:

UPDATE table_1 a
SET boolean_column = TRUE
WHERE a.boolean_column = FALSE
AND EXISTS(SELECT b.joining_key
         FROM table_2 b
         WHERE a.primary_key = b.joining_key
           AND b.category = 'bcd');

execution plan:

Update on table_1 a (cost=0.00..1154676.35 rows=1 width=136)
-> Nested Loop Semi Join (cost=0.00..1154676.35 rows=1 width=136)
Join Filter: (a.primary_key = (b.joining_key)::text)
-> Seq Scan on table_1 a (cost=0.00..36152.13 rows=1 width=129)
Filter: (NOT boolean_column)
-> Seq Scan on table_2 b (cost=0.00..1091533.00 rows=2159298 width=33)
Filter: ((category)::text = 'bcd'::text)

And in this case, table_2 has ~2 million rows with category = 'bcd'

I don't know if it's related but I looked at some stats for table_2 and see a large number dead tuples:

SELECT schemaname, relname, n_live_tup, n_dead_tup, last_autovacuum
FROM pg_stat_all_tables
WHERE relname = 'table_2'
ORDER BY n_dead_tup
             / (n_live_tup
                    * current_setting('autovacuum_vacuum_scale_factor')::float8
        + current_setting('autovacuum_vacuum_threshold')::float8)
        DESC;

/* returns
schemaname  relname  n_live_tup     n_dead_tup  last_autovacuum
public      table_2  4363942        549768      2021-11-09 03:15:53.936604+00:00
*/

Would running vacuum in this case help? Although I am skeptical because again, other categories seem to run fine. I can provide additional information - thanks.

EDIT 1: table_1 only has one constraint which is table_1.primary_key while table_2 has only one constraint which is table_2.primary_key. currently, table_2.joining_key does not have a foreign key constraint associated with table_1.primary_key. would it help to add that constraint?

0

There are 0 best solutions below