In GCP, I have an alerting policy on database CPU and memory usage. For example, if CPU is over 50% over a 1m period, the alert fires.
The alert is kind of noisy. With other systems, I've been able to alert only if the threshold is violated multiple times, e.g.
- If the threshold is violated for 2 consecutive minutes.
- If over a 5 minute period, the threshold is violated in 3 of those minutes.
(Note: I don't want to simply change my alignment period to 2 minutes.)
There are a couple things I've seen in the GCP alert configuration that might help here:
- Change the "trigger"
- UI: "Alert trigger: Number of time series violates", "Minimum number of time series in violation: 2".
- JSON:
"trigger": {"count": 2}
- Change the "retest window"
- UI: "Advanced Options" → "Retest window: 2m"
- JSON:
"duration": "120s"
But I can't figure out exactly how these work. Can these be used to achieve the goal?
The restest window option is usefull in the scenario i think, i have similar requirement that i have set it up in GCP alert policy for db CPU uttlisation breaches 70 % for 5 mins rolling time . if the alert gets clear in 5mins it wont alert but it reappears again for more than 10mins ,it can trigger alert.
I have setup in restest window of time limit 10mins.