PromQL query to calculate service uptime & downtime from a fixed date

5k Views Asked by At

I'm trying to build a basic SRE dashboard in order to learn Prometheus/Grafana.

I want to calculate the number of hours the service has been running & the number of hours its been down since the 1st January of the current year so that I can reduce the downtime hours from the error budget. Could a PromQL query be used to calculate this?

I would prefer to use a metric such as up which would be available regardless of the exporter/client library used.

1

There are 1 best solutions below

2
On

First of all, are you trying to calculate the availability of the Prometheus service or the availability of the services which are monitored by Prometheus?

If it's the first case then you can use the "up" metric, if it's the second one then you can use, for example, the "probe_success" metric from the Blackbox exporter.

See more info about the "up" and "probe_success" difference here.

See more info about the Blackbox exporter here.

You can calculate the availability (in percentage) with a query like the following:

100 * avg_over_time(probe_success{instance="xxxxx"}[1w])

In Grafana, you can use the global variable "$__range" as the time duration ([$__range]) to use in the PromQL the current time range of the dashboard.

See more info about global variables in the Grafana documentation here.