is there any good way to monitor apache beam dataflow job pipeline state?

224 Views Asked by At

We have a dataflow job which we want to monitor using StatsDClient, so we want to send a metrics from Dataflow job to our telegraf through StatsDClient to get heart beat of the dataflow job inorder to determine whether dataflow job is running or failed so that we can setup some alerts to it.

we tried initializing StatsDClient in main function and tried sending metrics by checking PipelineResult.getState() method, however this approach is not working for us

1

There are 1 best solutions below

0
Mazlum Tosun On

Instead of using the state from Dataflow jobs, you can use Cloud Monitoring :

  • Metric : Dataflow job Failed (for example)
  • Alerting policy based on this metric

The alert can be sent to a PubSub topic.

You can then develop the PubSub client of your choice that will consume the message from this topic (via subscription) and sent the element to your client.

Alerting policy :

There is a built in metric for Dataflow job failed status, you can create an alerting policy based on this metric :

enter image description here

Then configure a threshold :

enter image description here

If one Dataflow job failed, it will trigger an alert.

Notification channel :

For the alerting policy you can choose a Pub Sub topic.

enter image description here

For Dataflow job status (not only failed jobs), I saw the metric job/status GA in beta but I not used yet