pgSql crash after migrate to containerd

180 Views Asked by At

i have k8s cluster (1.22.3) with harbor installation (2.5.0, installed via helm char 1.9.0) harbor configured to use internal database and all work fine.

some time ago i remove docker from all nodes and reconfigure k8s to use containerd directly (based on https://kruyt.org/migrate-docker-containerd-kubernetes/)

all services are works normally after that but pgsql for harbor crashed periodically

in log of pod i cant see following:

2022-04-26 09:26:35.794 UTC [1] LOG:  database system is ready to accept connections
2022-04-26 09:31:42.391 UTC [1] LOG:  server process (PID 361) exited with exit code 141
2022-04-26 09:31:42.391 UTC [1] LOG:  terminating any other active server processes
2022-04-26 09:31:42.391 UTC [374] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [374] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [374] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [364] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [364] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [364] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [245] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [245] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [245] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [157] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [157] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [157] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [22] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [22] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [22] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [123] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [123] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [123] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [244] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [244] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [244] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.392 UTC [243] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.392 UTC [243] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.392 UTC [243] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.392 UTC [246] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.392 UTC [246] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.392 UTC [246] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.432 UTC [69] WARNING:  terminating connection because of crash of another server process
2022-04-26 09:31:42.432 UTC [69] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.432 UTC [69] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:43.031 UTC [375] FATAL:  the database system is in recovery mode
2022-04-26 09:31:43.532 UTC [376] LOG:  PID 243 in cancel request did not match any process
2022-04-26 09:31:46.992 UTC [1] LOG:  all server processes terminated; reinitializing
2022-04-26 09:31:47.545 UTC [377] LOG:  database system was interrupted; last known up at 2022-04-26 09:26:35 UTC
2022-04-26 09:31:47.545 UTC [378] LOG:  PID 245 in cancel request did not match any process
2022-04-26 09:31:50.472 UTC [388] FATAL:  the database system is in recovery mode
2022-04-26 09:31:50.505 UTC [398] FATAL:  the database system is in recovery mode
2022-04-26 09:31:52.283 UTC [399] FATAL:  the database system is in recovery mode
2022-04-26 09:31:56.528 UTC [400] LOG:  PID 246 in cancel request did not match any process
2022-04-26 09:31:58.357 UTC [377] LOG:  database system was not properly shut down; automatic recovery in progress
2022-04-26 09:31:59.367 UTC [377] LOG:  redo starts at 0/63EFC050
2022-04-26 09:31:59.385 UTC [377] LOG:  invalid record length at 0/63F6D038: wanted 24, got 0
2022-04-26 09:31:59.385 UTC [377] LOG:  redo done at 0/63F6D000
2022-04-26 09:32:00.480 UTC [410] FATAL:  the database system is in recovery mode
2022-04-26 09:32:00.511 UTC [420] FATAL:  the database system is in recovery mode
2022-04-26 09:32:00.523 UTC [1] LOG:  received smart shutdown request
2022-04-26 09:32:04.946 UTC [1] LOG:  abnormal database system shutdown
2022-04-26 09:32:05.139 UTC [1] LOG:  database system is shut down

and in the events of pod i see message about liveness/readness probe fail. there is no resource problem (no memory limit, no storage limit, cpu almost idle)

so i think that there is some misconfiguration in containerd, because with docker all works fine

env info:

  • k8s: 1.22.3
  • os: ubuntu 20.04
  • containerd: 1.5.5
0

There are 0 best solutions below