Threshold exeeced, but no incident is created in StackDiver

397 Views Asked by At

Problem: No incident is created if a time series exceed the threshold.

I want to get an alert if 5% of all requests returned 4xx in CloudRun. I created an alert policy with the following query:

fetch cloud_run_revision::run.googleapis.com/request_count
| { filter metric.response_code_class = '4xx'
  ; ident }
| group_by [resource.service_name], 1m, max(val())
| ratio
| condition val() > 0.05 '10^2.%'

In the cloud console, I can see that there are in fact time series which exceed the threshold:

Cloud Console Screenshot: Graph

The expectation is, that an incident is created. However, this is not the case.

Cloud Console Screenshot: Incidents

For the sake of completeness: I created the alert with terraform:

resource "google_monitoring_alert_policy" "cloudrun_http_4xx_errors" {
  display_name = "CloudRun 4xx errors"

  documentation {
    content = "CloudRun returned 4xx for more than 5% of its requests."
  }
  combiner = "OR"

  notification_channels = var.environment == "dev" ? [] : [
  google_monitoring_notification_channel.pubsubchannel.name]
  conditions {
    display_name = "4xx errors"
    condition_monitoring_query_language {
      query    = <<EOT
fetch cloud_run_revision::run.googleapis.com/request_count
| { filter metric.response_code_class = '4xx'
  ; ident }
| group_by [resource.service_name], 1m, max(val())
| ratio
| condition val() > 0.05 '10^2.%'
EOT
      duration = "60s"
    }
  }
}
0

There are 0 best solutions below