Is Elastic/Metricbeats suitable for process monitoring and alerting?

409 Views Asked by At

Do you use Elastic and Metricbeats for process monitoring and alerting? How did you configure your data gathering and alerting?

I am currently trying to set this up, and running into some basic issues. These issues are making me question whether Elastic is a suitable tool for alerting. Here is my planned setup:

  • Use Metricbeats to gather process data
  • Create an Elastic dashboard/lens for certain processes
  • If the process.cpu.start_time from Metricbeats is very young (e.g. it has only been running for under 5 minutes), alert!

I have been working my way through this using the following approach:

  • From Metricbeats, the processes include process.cpu.start_time, as a text string in ISO date format. Elastic lens queries are very limited with dates.
  • Workaround: use Logstash to create a filter field process.cpu.start_epoch, which is an integer - the Unix epoch: "seconds since January 1, 1970".
  • Create a dashboard lens, querying only my process, and only the last metric. This works and gives me "the time that the process started, as a Unix epoch".
  • I next need to calculate the time difference between now and that integer. However I don't see anything in the lens documentation about doing date math. So I'm stuck.

The difficulties I am encountering are making me wonder if I am "doing it wrong"? Is Elastic/Metricbeats a suitable tool for what I am trying to achieve?

1

There are 1 best solutions below

0
EdwardTeach On BEST ANSWER

Answer: find the right hammer!

What I needed is called "Elastic runtime fields". There's a step-by-step writeup here: https://elastic-content-share.eu/elastic-runtime-field-example-repository/

Summary:

  • open index
  • click the "dots"
  • choose "add field to index pattern"
  • set output field name as desired
    • for me this is process.cpu.start.age
  • set output type
    • for me this is "long"
  • write your script in "painless"
    • for me this is emit(Date().getTime() - doc['process.cpu.start'].value.toEpochMilli());

PS: I deleted my logstash filters, because they were superfluous.