Problem: No incident is created if a time series exceed the threshold.
I want to get an alert if 5% of all requests returned 4xx in CloudRun. I created an alert policy with the following query:
fetch cloud_run_revision::run.googleapis.com/request_count
| { filter metric.response_code_class = '4xx'
; ident }
| group_by [resource.service_name], 1m, max(val())
| ratio
| condition val() > 0.05 '10^2.%'
In the cloud console, I can see that there are in fact time series which exceed the threshold:
The expectation is, that an incident is created. However, this is not the case.
For the sake of completeness: I created the alert with terraform:
resource "google_monitoring_alert_policy" "cloudrun_http_4xx_errors" {
display_name = "CloudRun 4xx errors"
documentation {
content = "CloudRun returned 4xx for more than 5% of its requests."
}
combiner = "OR"
notification_channels = var.environment == "dev" ? [] : [
google_monitoring_notification_channel.pubsubchannel.name]
conditions {
display_name = "4xx errors"
condition_monitoring_query_language {
query = <<EOT
fetch cloud_run_revision::run.googleapis.com/request_count
| { filter metric.response_code_class = '4xx'
; ident }
| group_by [resource.service_name], 1m, max(val())
| ratio
| condition val() > 0.05 '10^2.%'
EOT
duration = "60s"
}
}
}