Postgresql autovacuum partitioned table

2.2k Views Asked by At

PostgreSQL 9.5.2 RDS in AWS

select name,setting from pg_settings 
where name like '%vacuum%' 
order by name;
                name                 |  setting
-------------------------------------+-----------
 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.05
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 450000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 30
 autovacuum_vacuum_cost_delay        | 20
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.1
 autovacuum_vacuum_threshold         | 50
 autovacuum_work_mem                 | -1
 log_autovacuum_min_duration         | 0
 rds.force_autovacuum_logging_level  | log
 vacuum_cost_delay                   | 0
 vacuum_cost_limit                   | 300
 vacuum_cost_page_dirty              | 20
 vacuum_cost_page_hit                | 1
 vacuum_cost_page_miss               | 10
 vacuum_defer_cleanup_age            | 0
 vacuum_freeze_min_age               | 50000000
 vacuum_freeze_table_age             | 250000000
 vacuum_multixact_freeze_min_age     | 5000000
 vacuum_multixact_freeze_table_age   | 150000000

I've been trying to figure out how auto vacuuming is working in two Postgres databases. The databases are identical in size, parameters and structure. (These are two data warehouses for the same application - different locations and different patterns of data).

We are using partitions for some of our very large tables. I've noticed that the older (static) partitions are regularly getting auto vacuumed. I understand that XIDs are frozen but the relation does need periodic vacuuming to look for and new XIDs.

I've been using this query to look for relations that will require vacuuming to avoid XID wrap around:

SELECT 'Relation Name',age(c.relfrozenxid) c_age, age(t.relfrozenxid) t_age,
       greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm')
order by age desc limit 5;

   ?column?    |   c_age   |   t_age   |    age
---------------+-----------+-----------+-----------
 Relation Name | 461544753 |           | 461544753
 Relation Name | 461544753 |           | 461544753
 Relation Name | 461544753 |           | 461544753
 Relation Name | 461544753 |           | 461544753
 Relation Name | 461544753 | 310800517 | 461544753

All of the relations listed are old stable partitions. The column relfrozenxid is defined as: "All transaction IDs before this one have been replaced with a permanent ("frozen") transaction ID in this table. This is used to track whether the table needs to be vacuumed in order to prevent transaction ID wraparound or to allow pg_clog to be shrunk."

Out of curiosity I looked at relfrozenxid for all of the partitions of a particular table:

SELECT c.oid::regclass as table_name,age(c.relfrozenxid) as age , c.reltuples::int, n_live_tup, n_dead_tup,
         date_trunc('day',last_autovacuum)
FROM pg_class c
JOIN pg_stat_user_tables u on c.relname = u.relname
WHERE c.relkind IN ('r', 'm')
and  c.relname like 'tablename%'

      table_name                     |    age    | reltuples | n_live_tup | n_dead_tup |       date_trunc
-------------------------------------+-----------+-----------+------------+------------+------------------------
 schema_partition.tablename_201202   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201306   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201204   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201110   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201111   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201112   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201201   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201203   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201109   | 460250527 |         0 |          0 |          0 | 2018-09-23 00:00:00+00
 schema_partition.tablename_201801   | 435086084 |  37970232 |   37970230 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201307   | 433975635 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201107   | 433975635 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201312   | 433975635 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201311   | 433975635 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201401   | 433975635 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201310   | 423675180 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201704   | 423222113 |  43842668 |   43842669 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201612   | 423222113 |  65700844 |   65700845 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201705   | 423221655 |  46847336 |   46847338 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201702   | 423171142 |  50701032 |   50701031 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_overflow | 423171142 |       754 |        769 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201106   | 421207271 |         1 |          1 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201309   | 421207271 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201108   | 421207271 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201308   | 421207271 |         0 |          0 |          0 | 2018-09-25 00:00:00+00
 schema_partition.tablename_201806   | 374122782 |  44626756 |   44626757 |          0 | 2018-09-26 00:00:00+00
 schema.tablename                    | 360135561 |         0 |          0 |          0 | 2018-09-27 00:00:00+00

I'm pretty sure I don't really understand how the relfrozenxid works but it does appear that the partition tables are affected by the parent table (which would affect the relfrozenxid value for the partitioned table). I can't find any documentation regarding this. I would think that for static tables the relfrozenxid would remain static until a vacuum occurred.

Additionally I have a handful of relations that have static data that apparently have never been auto vacuumed (last_autovacuum is null). Could this be a result of a VACUUM FREEZE operation?

I am new to Postgres and I readily admit to not fully understanding the auto vacuum processes.

I'm not seeing and performance problems that I can identify.

Edit:

I set up a query to run every 4 hours against one partitioned table:

SELECT c.oid::regclass as table_name,age(c.relfrozenxid) as age , c.reltuples::int, n_live_tup, n_dead_tup,
         date_trunc('day',last_autovacuum)
FROM pg_class c
JOIN pg_stat_user_tables u on c.relname = u.relname
WHERE c.relkind IN ('r', 'm')
and  c.relname like 'sometable%'
order by age desc;

Looking at two different partitions here is the output for the last 20 hours:

 schemaname.sometable_201812   | 206286536 |         0 |          0 |          0 |
 schemaname.sometable_201812   | 206286537 |         0 |          0 |          0 |
 schemaname.sometable_201812   | 225465100 |         0 |          0 |          0 |
 schemaname.sometable_201812   | 225465162 |         0 |          0 |          0 |
 schemaname.sometable_201812   | 225465342 |         0 |          0 |          0 |
 schemaname.sometable_201812   | 236408374 |         0 |          0 |          0 |
-bash-4.2$  grep 201610 test1.out
 schemaname.sometable_201610   | 449974426 |  31348368 |   31348369 |          0 | 2018-09-22 00:00:00+00
 schemaname.sometable_201610   | 449974427 |  31348368 |   31348369 |          0 | 2018-09-22 00:00:00+00
 schemaname.sometable_201610   | 469152990 |  31348368 |   31348369 |          0 | 2018-09-22 00:00:00+00
 schemaname.sometable_201610   |  50000051 |  31348368 |   31348369 |          0 | 2018-10-10 00:00:00+00
 schemaname.sometable_201610   |  50000231 |  31348368 |   31348369 |          0 | 2018-10-10 00:00:00+00
 schemaname.sometable_201610   |  60943263 |  31348368 |   31348369 |          0 | 2018-10-10 00:00:00+00

The relfrozenxid of partitions is being modified even though there is no direct DML to the partition. I would assume that inserts to the base table are somehow modifying the relfrozenxid of the partitions.

The partition sometable_201610 has 31 million rows but is static. When I look at the log files the autvacuum of this type of partition is taking 20-30 minutes. I don't know if that is a performance problem or not but it does seem expensive. Looking at the autovacuum in the log files shows that typically there are several of these large partitions autovacuumed every night. (There are also lots of the partitions with zero tuples that are autovacuumed but these take very little time).

0

There are 0 best solutions below