GCP alert: Only alerting when threshold is violated for multiple measurement periods

1.6k Views Asked by At

In GCP, I have an alerting policy on database CPU and memory usage. For example, if CPU is over 50% over a 1m period, the alert fires.

The alert is kind of noisy. With other systems, I've been able to alert only if the threshold is violated multiple times, e.g.

  • If the threshold is violated for 2 consecutive minutes.
  • If over a 5 minute period, the threshold is violated in 3 of those minutes.

(Note: I don't want to simply change my alignment period to 2 minutes.)

There are a couple things I've seen in the GCP alert configuration that might help here:

  1. Change the "trigger"
    • UI: "Alert trigger: Number of time series violates", "Minimum number of time series in violation: 2".
    • JSON: "trigger": {"count": 2}
  2. Change the "retest window"
    • UI: "Advanced Options" → "Retest window: 2m"
    • JSON: "duration": "120s"

But I can't figure out exactly how these work. Can these be used to achieve the goal?

1

There are 1 best solutions below

1
On

The restest window option is usefull in the scenario i think, i have similar requirement that i have set it up in GCP alert policy for db CPU uttlisation breaches 70 % for 5 mins rolling time . if the alert gets clear in 5mins it wont alert but it reappears again for more than 10mins ,it can trigger alert.

I have setup in restest window of time limit 10mins.