i have k8s cluster (1.22.3) with harbor installation (2.5.0, installed via helm char 1.9.0) harbor configured to use internal database and all work fine.
some time ago i remove docker from all nodes and reconfigure k8s to use containerd directly (based on https://kruyt.org/migrate-docker-containerd-kubernetes/)
all services are works normally after that but pgsql for harbor crashed periodically
in log of pod i cant see following:
2022-04-26 09:26:35.794 UTC [1] LOG: database system is ready to accept connections
2022-04-26 09:31:42.391 UTC [1] LOG: server process (PID 361) exited with exit code 141
2022-04-26 09:31:42.391 UTC [1] LOG: terminating any other active server processes
2022-04-26 09:31:42.391 UTC [374] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [374] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [374] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [364] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [364] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [364] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [245] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [245] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [245] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [157] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [157] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [157] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [22] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [22] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [22] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [123] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [123] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [123] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.391 UTC [244] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.391 UTC [244] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.391 UTC [244] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.392 UTC [243] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.392 UTC [243] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.392 UTC [243] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.392 UTC [246] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.392 UTC [246] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.392 UTC [246] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:42.432 UTC [69] WARNING: terminating connection because of crash of another server process
2022-04-26 09:31:42.432 UTC [69] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-04-26 09:31:42.432 UTC [69] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-04-26 09:31:43.031 UTC [375] FATAL: the database system is in recovery mode
2022-04-26 09:31:43.532 UTC [376] LOG: PID 243 in cancel request did not match any process
2022-04-26 09:31:46.992 UTC [1] LOG: all server processes terminated; reinitializing
2022-04-26 09:31:47.545 UTC [377] LOG: database system was interrupted; last known up at 2022-04-26 09:26:35 UTC
2022-04-26 09:31:47.545 UTC [378] LOG: PID 245 in cancel request did not match any process
2022-04-26 09:31:50.472 UTC [388] FATAL: the database system is in recovery mode
2022-04-26 09:31:50.505 UTC [398] FATAL: the database system is in recovery mode
2022-04-26 09:31:52.283 UTC [399] FATAL: the database system is in recovery mode
2022-04-26 09:31:56.528 UTC [400] LOG: PID 246 in cancel request did not match any process
2022-04-26 09:31:58.357 UTC [377] LOG: database system was not properly shut down; automatic recovery in progress
2022-04-26 09:31:59.367 UTC [377] LOG: redo starts at 0/63EFC050
2022-04-26 09:31:59.385 UTC [377] LOG: invalid record length at 0/63F6D038: wanted 24, got 0
2022-04-26 09:31:59.385 UTC [377] LOG: redo done at 0/63F6D000
2022-04-26 09:32:00.480 UTC [410] FATAL: the database system is in recovery mode
2022-04-26 09:32:00.511 UTC [420] FATAL: the database system is in recovery mode
2022-04-26 09:32:00.523 UTC [1] LOG: received smart shutdown request
2022-04-26 09:32:04.946 UTC [1] LOG: abnormal database system shutdown
2022-04-26 09:32:05.139 UTC [1] LOG: database system is shut down
and in the events of pod i see message about liveness/readness probe fail. there is no resource problem (no memory limit, no storage limit, cpu almost idle)
so i think that there is some misconfiguration in containerd, because with docker all works fine
env info:
- k8s: 1.22.3
- os: ubuntu 20.04
- containerd: 1.5.5