So I've created a logging alert policy on google cloud that monitors the project's logs and sends an alert if it finds a log that matches a certain query. This is all good and fine, but whenever it does send an email alert, it's barebones. I am unable to include anything useful in the email alert such as the actual message, the user must instead click on "View incident" and go to the specified timeframe of when the alert happened.

Is there no way to include the message? As far as I can tell viewing the gcp Using Markdown and variables in documentation templates doc on this.

I'm only really able to use ${resource.label.x} which isn't really all that useful because it already includes most of that stuff by default in the alert.

Could I have something like ${jsonPayload.message}? It didn't work when I tried it.

4

There are 4 best solutions below

0
On BEST ANSWER

Probably (!) not.

To be clear, the alerting policies track metrics (not logs) and you've created a log-based metric that you're using as the basis for an alert.

There's information loss between the underlying log (that contains e.g. jsonPayload) and the metric that's produced from it (which probably does not). You can create Log-based metrics labels using expressions that include the underlying log entry fields.

However, per the example in Google's docs, you'd want to consider a limited (enum) type for these values (e.g. HTTP status although that may be too broad too) rather than a potentially infinite jsonPayload.

0
On

It is possible. Suppose you need to pass "jsonPayload.message" present in your GCP log to documentation section in your policy. You need to use "label_extractor" feature to extract your log message.

I will share a policy creation JSON file template wherein you can pass "jsonPayload.message" in the documentation section in your policy.

policy_json = {
  "display_name": "<policy_name>",
  "documentation": {
    "content": "I have the extracted the log message:${log.extracted_label.msg}",
    "mime_type": "text/markdown"
  },
  "user_labels": {},
  "conditions": [
    {
      "display_name": "<condition_name>",
      "condition_matched_log": {
        "filter": "<filter_condition>",
        "label_extractors": {
          "msg": "EXTRACT(jsonPayload.message)"
        }
      }
    }
  ],
  "alert_strategy": {
    "notification_rate_limit": {
      "period": "300s"
    },
    "auto_close": "604800s"
  },
  "combiner": "OR",
  "enabled": True,
  "notification_channels": [
    "<notification_channel>"
  ]
}
0
On

It is possible, but hard to understand from Google documentation. The answer is a modification of the answer provided by Naveen Thomas, depending on the log entry you are trying to pass.

For this, you will have to open the log entry in Logs Explorer, and take a look at the fields available.

For example, in Cloud SQL error log, the field which contains the message is "textPayload". To pass this to the notification, you would create an log-based alert policy and use the "Extract log labels" option (label_extractors in JSON), enter a display name (I used "msg") and enter "textPayload" as "log field name".

Then add $"{log.extracted_label.msg}" to the "Documentation" field, formatting as you see fit.

In JSON, my policy looks like this:

{
  "name": "projects/<project_name>/alertPolicies/<Policy_ID>",
  "displayName": "<Policy name>",
  "documentation": {
    "content": "CloudSQL Instance Log-based alert in project <project_name> detected:\n\n${log.extracted_label.msg}",
    "mimeType": "text/markdown"
},
"userLabels": {},
"conditions": [
{
  "name": "projects/<project_name>/alertPolicies/<Policy_ID>/conditions/<Condition_ID?>",
  "displayName": "Log match condition",
  "conditionMatchedLog": {
    "filter": "resource.type=\"cloudsql_database\"\nresource.labels.database_id=\"<project_name>:<instance_ID>\"\nlogName=\"projects/<project_name>/logs/cloudsql.googleapis.com %2Fsqlserver.err\"\nseverity=(INFO OR ERROR OR CRITICAL OR ALERT OR EMERGENCY)",
    "labelExtractors": {
     "msg": "EXTRACT(textPayload)"
      }
    }
  }
],
  "alertStrategy": {
    "notificationRateLimit": {
      "period": "300s"
    },
    "autoClose": "604800s"
  },
  "combiner": "OR",
  "enabled": true,
  "notificationChannels": [
    "projects/<project_name>/notificationChannels/<notificationChannel_ID>"
  ],
  "creationRecord": {
    "mutateTime": "2023-01-01T07:11:53.406233445Z",
    "mutatedBy": "<User>"
  },
  "mutationRecord": {
    "mutateTime": "2023-01-01T13:22:19.917589988Z",
    "mutatedBy": "<User>"
  }
}

References:

https://cloud.google.com/logging/docs/logs-based-metrics/labels https://cloud.google.com/monitoring/alerts/doc-variables#doc-vars

0
On

enter image description here Enter image description here Enter image description here Hi Team, For this issue i am attaching documentation for your reference how to create a alert policy on gcp console.It will really help for all GCP folks.

Creating Alert policy Metric for Cloud DataProc Job Failure on Test and Prod Environment: • First Step we need to go “google cloud console” and choose Flexible environment. • Go to Cloud logging page view the error logs. According to the Error logs we need to create metric. • Here we need to select the cluster name, service name and Error message. For your reference I am attaching the snapshot below.

enter image description here

• In the above snapshot there will the create metric option. We need to select that metric option once after selecting the metric option it will show like below snapshot.

• After selecting Create Metric , It will shows the Details, Filter selection& We can add Labels as well.

Note: This custom metric we are creating according to our error Logs. Here We need to fill mandatory fields like log based name, description and Filter Selection. Once after filling the details. Click on create a cluster.

Important: According to your filter section you can check the logs. There is preview logs option on top Filter selection. This preview log will show only one hour duration logs. That means, If suppose your job failure on 10:30 am it will show upto 11:30 am. After that it won’t show any logs.

Note: If you want to create a custom metric I will suggest to check the job failure time and create the custom metric. If we create at that time you can apply regular expression as well and you can pick which label you want from this error logs.

• After Creating the custom metric we need to navigate into Metric Explorer page.

• Here We need to select our created metric name form metric drop options. Once after selecting our metric name, we can choose which graph is suitable for you to create a threshold value. • After selecting this value, it will show the error logs according to your custom filters. Note: for your reference here, I am selecting my custom metric in that custom metric I am applying the filter on dataproc cluster job failure name and click on apply button.

• After Selecting according to cluster name it will show cluster job failure log on the graph.Upto here we are just creating the metric. • Now, We need to select the aletring option from navigation menu on left side of the console. • After selecting the Alerting option it will show like below snap shot.

• Click on Create policy option. Will take into the alert policy creation page and select your custom metric name and click on apply button on it.

• After selecting your metric it will show the error graph according to the cluster name from above snapshot and Click Next Button it will take into Alter Details.

• Here We need to give threshold value according to our job failure and click on next button.

• Here We need to fill the notification channel with your email id or your team mail group name. and we need to mention subject for the alert notification. According to your severity you need to select the policy Severity level and Give unique name for your Alert policy.

• Once after creation the alert policy you will receive alert notifications according to your severity.

• This is to create custom alert metric and policy.