Movement of restart_lsn position movement of logical replication slots is very slow

1.4k Views Asked by At

We have two logical replication slots in our postgresql database (version-11) instance and we are using pgJDBC to stream data from these two slots. We are ensuring that when we regularly send feedback and update the confirmed_flush_lsn (every 10 minutes) for both the slots to the same position. However From our data we have seen that the restart_lsn movement of the two are not in sync and most of the time one of them lags too far behind to hold the WAL files unnecessarily. Here are some data points to indicate the problem

Thu Dec 10 05:37:13 CET 2020
                      slot_name       |  restart_lsn  | confirmed_flush_lsn 
--------------------------------------+---------------+---------------------
 db_dsn_metadata_src_private          | 48FB/F3000208 | 48FB/F3000208
 db_dsn_metadata_src_shared           | 48FB/F3000208 | 48FB/F3000208
(2 rows)



Thu Dec 10 13:53:46 CET 2020
                      slot_name      |  restart_lsn  | confirmed_flush_lsn 
-------------------------------------+---------------+---------------------
 db_dsn_metadata_src_private         | 48FC/2309B150 | 48FC/233AA1D0
 db_dsn_metadata_src_shared          | 48FC/233AA1D0 | 48FC/233AA1D0
(2 rows)


Thu Dec 10 17:13:51 CET 2020
                      slot_name      |  restart_lsn  | confirmed_flush_lsn 
-------------------------------------+---------------+---------------------
 db_dsn_metadata_src_private         | 4900/B4C3AE8  | 4900/94FDF908
 db_dsn_metadata_src_shared          | 48FD/D2F66F10 | 4900/94FDF908
(2 rows)

Though we are using setFlushLsn() and forceStatusUpdate for both the slot's stream regularly still the slot with name private is far behind the confirmed_flush_lsn and slot with name shared is also behind with confirmed_flush_lsn but not too far. Since the restart_lsn is not moving fast enough, causing lot of issues with WAL log file management and not allowing to delete them to free up disk space

How can this problem be solved? Are there any general guidelines to overcome this issue ?

We have seen another thread with similar question but no response there too. WALs getting pilled up - restart_lsn of logical replication not moving in PostgreSQL

I am using the sample program published by pgJDBC here : https://jdbc.postgresql.org/documentation/head/replication.html to get streaming changes from the postgresql here.
0

There are 0 best solutions below