We collect the following two prometheus metrics over time: The first is:
# HELP was_sib_durableSubscription_messageWait_time_seconds_total Total amount of time (in seconds) spent on the bus by messages consumed from this subscription.
# TYPE was_sib_durableSubscription_messageWait_time_seconds_total gauge
The second is:
# HELP was_sib_durableSubscription_messageWait_total The number of messages that waited on the bus.
# TYPE was_sib_durableSubscription_messageWait_total counter
We tried dividing the two metrics but the result does not seem to be right.
How can we create a graph that shows the average wait time per message that waited on the bus?
The following PromQL query returns the average wait time over the last 10 minutes:
It uses increase function for calculating the increase of wait time sum and wait time measurements over the last 10 minutes (such metrics are called counters).
It also uses
/operator for dividing the sum of wait times over the last 10 minutes by the number of wait time measurements during the last 10 minutes.If you need calculating the average wait time over another lookbehind window, then just replace
10min the query above with the needed duration. See the list of supported durations in PromQL.P.S. If the
was_sib_durableSubscription_messageWait_totalcounter changes slowly over time, thenincrease()over this counter in Prometheus may return unexpected results because of extrapolation - see this issue for details. The workaround is to useincrease()function from VictoriaMetrics - this is Prometheus-like monitoring solution I work on. It doesn't use extrapolation inincrease()function. See these docs for more details.