Understanding auto-vacuum and when it is triggered

2.6k Views Asked by At

We've noticed one of our tables growing considerably on PG 12. This table is the target of very frequent updates, with a mix of column types, including a very large text column (often with over 50kb of data) - we run a local cron job that looks for rows older than X time and set the text column to a null value (as we no longer need the data for that particular column after X amount of time).

We understand this does not actually free up disk space due to the MVCC model, but we were hoping that auto-vacuum would take care of this. To our surprise, the table continues to grow (now over 40gb worth) without auto-vacuum running. Running a vacuum manually has addressed the issue and we no longer see growth.

This has lead me to investigate other tables, I'm realising that I don't understand how auto-vacuum is triggered at all.

Here is my understanding of how it works, which hopefully someone can pick apart:

  • I look for tables that have a large amount of dead tuples in them: select * from pg_stat_all_tables ORDER BY n_dead_tup desc;
  • I identify tableX with 33169557 dead tuples (n_dead_tup column).
  • I run a select * from pg_class ORDER BY reltuples desc; to check how many estimated rows there are on table tableX
  • I identify 1725253 rows via the reltuples column.
  • I confirm my autovacuum settings: autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2
  • I apply the formula threshold + pg_class.reltuples * scale_factor, so, 50 + 1725253 * 0.2 which returns 345100.6

It is my understanding that auto-vacuum will start on this table once ~345100 dead tuples are found. But tableX is already at a whopping 33169557 dead tuples!, The last_autovacuum on this table was back in February.

Any clarification would be welcome.

1

There are 1 best solutions below

4
On BEST ANSWER

Your algorithm is absolutely correct.

Here are some reasons why things could go wrong:

  • autovacuum runs, but is so slow that it never gets done

    If you see no running autovacuum, that is not your problem.

  • autovacuum runs, but a long running open transaction prevents it from removing dead tuples

  • other tables need to be vacuumed more urgently (to avoid transaction ID wraparound), so the three workers are busy with other things

  • autovacuum runs, but conflicts with high concurrent locks on the table (LOCK TABLE, ALTER TABLE, ...)

    This makes autovacuum give up and try again later.

  • autovacuum is disabled, perhaps only for that table