Grafana Alerting Acting Weird

11 Views Asked by At

I am new to grafana and I am working on one alert that is acting weird. This alert is all about alerting slack channel if a service is down. Supposedly, wanted to alert it if service is down for about 1 hour or so but then in my exp below, it is not alerting even though there are data points being capture on this query and that data down service is state are more than 1 hour or so.

for:30 mins

Exp: avg by(host) (avg_over_time(service_state{host=~"^nodeadmin.*"}[10m] OR on() vector(0))) == 0

So what I did was adjust to try to test out if lowering the For rule to 10 mins will work. but now, i find it weird. it is alerting when there is a node having service down less than 10 mins. 6 to 7 mins to be exact. good thing about lowering the for rule to 10 mins is that, some of the nodes having service state down to 1 hour or so alerted but not all.

What could be the best query to have this resolved

0

There are 0 best solutions below