I am using monit to monitor my program. The program being monitored can potentially crash under 2 situations
- Program can randomly crash. It just needs to be restarted
- It gets into a bad state and crashes each time it is started subsequently
To fix the latter situation, I have a script to stop the program, reset it to a good state by cleaning its data files and restart it. I tried the below config
check process program with pidfile program.pid
start program = "programStart" as uid username and gid groupname
stop program = "programStop" as uid username and gid groupname
if 3 restarts within 20 cycles then exec "cleanProgramAndRestart" as uid username and gid groupname
if 6 restarts within 20 cycles then timeout
Say monit restarts the program 3 times in 3 cycles. After it is restarted the third time, the cleanProgramAndRestart script runs. However as the cleanProgramAndRestart script restarts the program yet again, the condition of 3 restarts is met again in the next cycle and it becomes an infinite loop
Could anyone suggest any way to fix this?
If any of the below actions are possible, then there may be a way around.
- If there is a "crash" keyword, instead of "restarts", I will be able to run the clean script after the program crashes 3 times instead of after it is restarted 3 times
- If there is a way to reset the "restarts" counter in some way after running the exec script
- If there is a way to exec something only if output of the condition 3 restarts changed
Monit is polling your "tests" every cycle. The cycle length is usually defined in
/etc/monitrc
, inset daemon cycle_length
So if your
cleanProgramAndRestart
takes less than a cycle to perform, it shouldn't happen. As it is happening, I guess yourcleanProgramAndRestart
takes more than a cycle to perform.You can:
If you can't modify these variables, there could be a little workaround, with a temp file: