I run a DB cluster that needs to be in time. Sadly, sometimes my VM hoster is moving the VM with such DB node to another host and then the time is lacking a second or more. My DB node then shuts down and is restarted by systemd.
My systemd file contains this:
ExecStartPre=-+/usr/bin/chronyc -a makestep
ExecStart=/usr/local/bin/.......
I expected this to sync my time immediately after such time lag shut down the database. But due to my logs it took up to 7 minutes until the difference was recognized and fixed. My database detected the gap on every restart and shut down again. Finally, I get this chronyd log:
Nov 16 10:25:51 dc3-sirius chronyd[164166]: System clock was stepped by 0.000020 seconds
Nov 16 10:26:07 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:26:23 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:26:39 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:26:55 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:11 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:27 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:43 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:59 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:15 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:31 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:47 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:59 dc3-sirius chronyd[164166]: Source 81.169.199.94 replaced with 212.71.244.243
Nov 16 10:29:03 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:29:19 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:29:32 dc3-sirius chronyd[164166]: Selected source 109.230.227.90
Nov 16 10:29:35 dc3-sirius chronyd[164166]: System clock was stepped by 0.003850 seconds
Nov 16 10:29:51 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:07 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:23 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:39 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:55 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:11 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:27 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:43 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:59 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:32:13 dc3-sirius chronyd[164166]: Can't synchronise: no majority
Nov 16 10:32:15 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:32:31 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:32:33 dc3-sirius chronyd[164166]: Selected source 109.230.227.90
Nov 16 10:32:33 dc3-sirius chronyd[164166]: System clock wrong by 1.101260 seconds, adjustment started
Nov 16 10:32:48 dc3-sirius chronyd[164166]: System clock was stepped by 1.003151 seconds
Nov 16 10:33:04 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:33:21 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:33:37 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:33:51 dc3-sirius chronyd[164166]: Selected source 162.159.200.123
Nov 16 10:33:53 dc3-sirius chronyd[164166]: System clock was stepped by 0.409613 seconds
As you can see, it started to sync the clock after >7 minutes:
My DB detected the issue at 10:25:51. From this, the above command was executed several times to re-sync the clock before every database restart. But it needed until 10:32:33 and 10:33:53 to really finally fix the clock.
Any idea how I can force the clock to get synchronized directly and not many minutes later?
I finally found a solution to keep chrony and also force an immediate time sync in case of a time lag (detected by the DB node). The solution was to restart the chronyd service - simulating a reboot of the system.
I changed my systemd file of the database to look like this:
In the
/etc/chrony.conf
file I added the lines:This forces chronyd to re-sync the time on a restart if the time offset is bigger than 0.5 seconds.
This finally makes my system re-sync directly and then restart the database node immediately.
You can look here to find more information about the chrony.conf options.