CheckMK Alert Handler Logging Error - Alert Handlers are triggering but not running the script

87 Views Asked by At

CMK version: Checkmk Enterprise Edition 2.1.0p28

OS version: Ubuntu 20.04.6 LTS

Error message: — Logging error —\nTraceback (most recent call last):\n File “/omd/sites/Site01/lib/python3/cmk/base/events.py”, line 112, in event_keepalive\n event_function(context)\n File “/omd/sites/Site01/lib/python3/cmk/base/cee/alert_handling.py”, line 205, in process_alert\n execute_alert_rules(raw_context)\n File “/omd/sites/Site01/lib/python3/cmk/base/cee/alert_handling.py”, line 216, in execute_alert_rules\n execute_alert_handler_rule(rule, context)\n File “/omd/sites/Site01/lib/python3/cmk/base/cee/alert_handling.py”, line 321, in execute_alert_handler_rule\n run_process_alert_handler(handler_name, handler_params, context)\n File “/omd/sites/Site01/lib/python3/cmk/base/cee/alert_handling.py”, line 383, in run_process_alert_handler\n handler_pid, handler_pipe = run_alert_handler_async(\n File “/omd/sites/Site01/lib/python3/cmk/base/cee/alert_handling.py”, line 442, in run_alert_handler_async\n os.execv(handler_path, command_line_arguments)\nFile

I changed the alert_handlers folder and my specific jenkins.sh script to 777 to rule out any permissions issues. I have no idea why this is occurring now.

enter image description here

It fails for any server and does not seem to be related to the script itself or the alert handler configuration. It does not appear to be related to a version change since the alert handler was succeeding on the new version.

1

There are 1 best solutions below

0
Quinn Favo On

CheckMK is easily corrupted it seems. After working with their devs, we came to this resolution:

Delete the alert handler rule in the GUI for the jenkins.sh script on the central site RedactedServerName

Delete ~/local/share/check_mk/alert_handlers/jenkins.sh (cp a backup to a safe place first) on central site RedactedServerName

activate changes -> This will sync a clean state to the remote sites.

After activating the changes:

create new ~/local/share/check_mk/alert_handlers/jenkins.sh as site user on the central site RedactedServerName and copy the code of the backup jenkins.sh file in it

To perform this I had to run the following code:

sudo omd su RedactedServerName
cd /opt/omd/sites/RedactedServerName/local/share/check_mk/alert_handlers
touch jenkins.sh
nano jenkins.sh
chmod +x jenkins.sh

Create alert handler rule in the GUI of the central site

Activate changes