I need to create an alerting system that has to notify when a particular condition (e.g. Tomcat goes down) is met. Multiple remote servers deployed in different locations (with different time zones) host Tomcat services and are being monitored by Prometheus. I need to receive the alert only from 8:05 to 22:45 local time so I proceed as follow:
- Defined a custom rule "check_system_time_in_interval" that returns 1 if the server local time is in [8:05,22:45], 0 otherwise
- Used rule 1) to define an alert Inhibit alert during NO working hours in "prometheus.rule.yml":
- name: quite_hours
rules:
- alert: Inhibit alert during NO working hours
expr: check_system_time_in_interval==0
labels:
notification: none
severity: critical
- Defined a new inhibit_rule in "alertmanager.yml" file that inhibits the alert TOMCAT down (fires alert when TOMCAT service is out of service) when the monitored server time is not in the interval
inhibit_rules:
- source_match:
alertname: Inhibit alert during NO working hours
target_match_re:
alertname: (TOMCAT down)
TOMCAT down uses a custom rule "tomcat_up" that checks if Tomcat is up or not. Now it seems to work quite properly but with this approach I would have problems due to time zones: I need to be notified if the LOCAL time of the monitored server is in [8:05,22:45] even if Prometheus server is located to different time zone.
One simple solution would be to inhibit the alert only if the 'instance' label of check_system_time_in_interval time series is equals to 'instance' label of TOMCAT down (e.g. if check_system_time_in_interval{instance="10.41.0.118"}=0 and tomcat_up{instance="10.41.0.118}=1" then fire an alert) but I don't know how to modify "inhibit_rule" in order to do that.
After a while, I came up with a trivial solution: adding a new label "timezone" and inhibit an alert only if "timezone" labels match.