Cloudwatch Alarm doesn't leave alarm state and doesn't retrigger

601 Views Asked by At

I created a custom metric with the unit count. The requirement is to check every 24h if the sum of the metric count is >= 1. If so a message should be sent to sns topic which triggers a lambda which sends a message to slack channel.

Metric behaviour: Currently the custom metric is always higher than one. I crate a datapoint every 10 sec.

Alarm behaviour: The alarm instantly switches into alarm state and sends a message to the sns topic. But the state never leaves the alarm state and also doesn't retrigger a new message 24h later to the sns topic.

How should I configure my alarm if I want to achieve my requirement?

Thanks in advance, Patrick

Here is the aws cloudwatch describe-alarms result:

{
"MetricAlarms": [
    {
        "AlarmName": "iot-data-platform-stg-InvalidMessagesAlarm-1OS91W5YCQ8E9",
        "AlarmArn": "arn:aws:cloudwatch:eu-west-1:xxxxxx:alarm:iot-data-platform-stg-InvalidMessagesAlarm-1OS91W5YCQ8E9",
        "AlarmDescription": "Invalid Messages received",
        "AlarmConfigurationUpdatedTimestamp": "2020-04-03T18:11:15.076Z",
        "ActionsEnabled": true,
        "OKActions": [],
        "AlarmActions": [
            "arn:aws:sns:eu-west-1:xxxxx:iot-data-platform-stg-InvalidMessagesTopic-FJQ0WUJY9TZC"
        ],
        "InsufficientDataActions": [],
        "StateValue": "ALARM",
        "StateReason": "Threshold Crossed: 1 out of the last 1 datapoints [3.0 (30/03/20 11:49:00)] was greater than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition).",
        "StateReasonData": "{\"version\":\"1.0\",\"queryDate\":\"2020-03-31T11:49:03.417+0000\",\"startDate\":\"2020-03-30T11:49:00.000+0000\",\"statistic\":\"Sum\",\"period\":86400,\"recentDatapoints\":[3.0],\"threshold\":1.0}",
        "StateUpdatedTimestamp": "2020-03-31T11:49:03.421Z",
        "MetricName": "InvalidMessages",
        "Namespace": "Message validation",
        "Statistic": "Sum",
        "Dimensions": [
            {
                "Name": "stream",
                "Value": "raw events"
            },
            {
                "Name": "stage",
                "Value": "stg"
            }
        ],
        "Period": 86400,
        "EvaluationPeriods": 1,
        "DatapointsToAlarm": 1,
        "Threshold": 1.0,
        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
        "TreatMissingData": "notBreaching"
    }
]

}

0

There are 0 best solutions below