I am trying to create PostgreSQL - Etcd - Patroni(PEP) cluster. There are lots of examples on the internet and I have created one and it runs perfect. Yet, this architecture should comply with my company' s backup solution which is NetApp. We are putting the database into backup mode with "SELECT pg_start_backup('test_backup', true);" and then copy all the data files to backup directory.
PEP cluster has a small problem with this solution. Taking backup is running fine, but restoration point is not that much good. In order to restore the leader of the PEP cluster I need stop the database and then move the backup files to the data directory and finally start the restoration. At this point Patroni says the restoration node is a new cluster. Here is the error:
raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
2022-04-11 12:49:29,930 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-04-11 12:49:29,942 INFO: Lock owner: None; I am pgsql_node1
2022-04-11 12:49:29,962 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
Also, when I check the patroni cluster status I saw this:
root@4cddca032454:/data/backup# patronictl -c /etc/patroni/config.yml list
+ Cluster: pgsql (7085327534197401486) --------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+------------+---------+---------+----+-----------+
| pgsql_node1 | 172.17.0.6 | Replica | stopped | | unknown |
| pgsql_node2 | 172.17.0.7 | Replica | running | 11 | 0 |
| pgsql_node3 | 172.17.0.8 | Replica | running | 11 | 0 |
+-------------+------------+---------+---------+----+-----------+
At this point I have a PEP cluster without a leader. So, how can I solve this issue?
(Note: The restoration node attempted to join right cluster because, before starting the restoration I check cluster status and got this result:
root@4cddca032454:/data/backup# patronictl -c /etc/patroni/config.yml list
+ Cluster: pgsql (7085327534197401486) --------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+------------+---------+---------+----+-----------+
| pgsql_node2 | 172.17.0.7 | Replica | running | 11 | 0 |
| pgsql_node3 | 172.17.0.8 | Replica | running | 11 | 0 |
+-------------+------------+---------+---------+----+-----------+
pgsql_node1 is not there.
)
As explained here, "https://patroni.readthedocs.io/en/latest/existing_data.html#existing-data" I can create a new cluster after restoration but my priority saving the cluster. Or do I think wrong, all this steps are same with the converting a standalone PostgreSQL database to PEP cluster?
Please let me know if you need any data or something is not clear.
Here is my leader node patroni config file:
scope: "cluster"
namespace: "/cluster/"
name: 8d454a228d251
restapi:
listen: 172.17.0.2:8008
connect_address: 172.17.0.2:8008
etcd:
host: 172.17.0.2:2379
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
check_timeline: true
postgresql:
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
remove_data_directory_on_diverged_timelines: true
use_slots: true
postgresql:
listen: 0.0.0.0:5433
connect_address: 172.17.0.2:5432
use_unix_socket: true
data_dir: /data/postgresql/
bin_dir: /usr/lib/postgresql/14/bin
config_dir: /etc/postgresql/14/main
authentication:
replication:
username: "patroni_replication"
password: "123123"
superuser:
username: "patroni_superuser"
password: "123123"
parameters:
unix_socket_directories: '/var/run/postgresql/'
logging_collector: 'on'
log_directory: '/var/log/postgresql'
log_filename: 'postgresql-14-8d454a228d25.log'
restore_command: 'cp /data/backup/%f %p'
recovery_target_timeline: 'latest'
promote_trigger_file: '/tmp/promote'
Thanks!
If I understand well you want to restore the primary server (leader) restoring the data directory with a new set of backup files.
After doing the restore in data directory of leader you need to recreate the patroni cluster (remove the keys in DCS) with patronictl remove option.
Example:
stop pgsql_node2
stop pgsql_node3
stop pgsql_node1
on pgsql_node1:
patronictl -c /etc/patroni/config.yml remove <clustername>
start pgsql_node1
start pgsql_node2
start pgsql_node3