Can AppEngine response count metric be grouped on response class?

397 Views Asked by At

I'm trying to create an alert on CGP/stackdriver using the http/server/response_count metric for app engine. This metric has an response_code field that I can group_by:

fetch gae_app::appengine.googleapis.com/http/server/response_count
| filter metric.response_code>=500 && metric.response_code<600
| every 10m
| group_by [metric.response_code], sum(val())

But say I want to merge all 500+ responses under a 5xx class of response and then aggregate to a single count for the range, is it possible to pre-process so the group_by in the above example yields a single time series eg 5xx? I notice that one of the load balancer metrics has a "response_code_class" of this kind, but this is NOT available for this metric.

After that I'm looking for a ratio of 5xx requests to all requests, would that even be possible with this metric?

2

There are 2 best solutions below

1
On

Below is a query that does the following:

  • Use a group_by to count the 5xx responses in a 10-minute sliding window.
  • In the same group_by, also count all responses in the same 10-minute sliding window.
  • After the group_by, simply compute the ratio of the two counts.
fetch gae_app
| metric 'appengine.googleapis.com/http/server/response_count'
| group_by [], sliding(10m), [
    countAll: sum(response_count), 
    count5xx: sum(if(response_code>=500 && response_code < 600, response_count, 0))]
| value (count5xx / countAll)
| every 1m

Screenshot of the chart produced by a similar query:

Screenshot of the chart produced by a similar query

The output of the above query is a ratio of 5xx responses to all responses.

Note: if you wanted to compute these ratios, for example, by zone, simply add zone to the first argument of group_by like this: group_by [zone], sliding(10m), [countAll: ..., count5xx: ...] | value (count5xx / countAll)

5
On

There's another way to compute the ratio of error responses to all responses. In this case, because the numerator and denominator for the ratio are derived from the same time series, you can also compute the ratio by grouping. Please check the grouping query format once:

fetch gae_app::appengine.googleapis.com/http/server/response_count
| group_by [metric.response_code],
    sum(if(response_code_class = 500, val(), 0)) / sum(val())

For more details please refer to example grouping.