Email alert based on monitors going down / coming back up

277 Views Asked by At

I am fairly new to this but I need some help with my watcher setup. - I am using the X-Pack Watchers.

I have setup Heartbeat and I currently have 7 monitors. i.e monitor-01 monitor-02 etc.

I need help setting up my exact scenarios, I need help with 3 scenarios:

Scenario 1: If monitor-01 goes offline, I want to send ONLY 1 email to "[email protected]" with the body of: "Hello there, monitor-01 just went offline! Please check, thanks."

If monitor-02 goes offline, I want the exact same result as above.. I dont want multiple emails alerting me every second / minute if the monitor is down, I only want 1 email.

Scenario 2: If monitor-01 or any my monitors are offline... Every 3 hours, I want a refresh email sent out (I would like the email body to contain how long the specific monitor is down for, i.e monitor down for 120hours 13 minutes). So, if 3 hours pass, I want to send an email to "[email protected]" with the body of: "Hello there, this is a reminder email that monitor-01 is still offline! Please check, thanks."

Scenario 3: If any of the monitors come back online, I want to send out an email to "[email protected]" with the body of: "Hello there, great news! monitor-02 is back online. The monitor was down for 7hours 12 minutes. Thanks."

Can someone please assist? I looked everywhere and cannot find the correct syntax to create the above scenarios. These would be scenarios I feel could benefit other members of the community.

P.s, I currently have an advanced watch that I found in the forums but does not match my criteria. Here is the code for it:

{
  "trigger": {
    "schedule": {
      "interval": "30s"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "heartbeat-*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "query": {
            "bool": {
              "must": {
                "match": {
                  "monitor.status": "down"
                }
              },
              "filter": {
                "range": {
                  "@timestamp": {
                    "from": "now-50s"
                  }
                }
              }
            }
          },
          "aggregations": {
            "by_monitors": {
              "terms": {
                "field": "monitor.name"
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "email_admin": {
      "email": {
        "profile": "standard",
        "from": "[email protected]",
        "to": [
          "[email protected]"
        ],
        "subject": "Monitor is DOWN: {{ctx.payload.aggregations.by_monitors.buckets.0.key}}",
        "body": {
          "text": "Hello, there is a monitor offline currently. Please check..."
        }
      }
    }
  }
}

The above script I put together sends an email every 30 seconds if a monitor is down which is not what I want.

Here is what the email says when everything is put together: Subject: Monitor is DOWN: [UAT] Test Website Body: Hello, there is a monitor offline currently. Please check...

Can someone assist with my scenarios? I spent days on this already.. many hours gone in, not much to come out from it!

Thanks.

1

There are 1 best solutions below

0
On

There are two things you could look into: throttling and acknowledging notifications

  • throttling: once an action is done (eg sending an email), don't do that action again for 3 hours
  • acknowledge: : in this case you need to call the API to say: "I did receive the notification that watcher01 has found an error, don't send any notifications about it again."