I want to design an SLI/SLO based on the two counters described below:
requestedCounter= Prometheus counter that gets incremented every time a request is sent to downstream service
confirmedCounter = Prometheus counter that gets incremented every time a confirmation is received notifying that a downstream service has processed a request
Would it make sense to something like = 1- [ sum(rate(confirmedCounter)) / sum(rate(requestedCounter)) ] to model bad events/total events? or would using something like a count_over_time make more sense rather than rate?
Any other suggestions would be appreciated too as I'm new to Prometheus SLI/SLOs.
count_over_timewould not work for your use case, as it counts the number of samples for each series over the specified time period.As an example, check out this query here.
It seems like you are interested in the ratio of the rate of increase of both
countermetrics, and hence usingratemakes more sense.One thing to be cautious about when constructing your PromQL query is to careful to understand how Operators work (see docs here).
For division, your numerator or denominator could evaluate to a scalar or vector depending on the query. I'd recommend trying to evaluate both the numerator and denominator by itself independently, in the Prometheus expression browser first, so that you ensure your final query (after doing division or multiplication) is correct.