Prometheus to count times an ip is announced by a given node (metallb)

167 Views Asked by At

I´d like to detect times where an IP is jumping between nodes. Each time an ip jumps, it is announced by the node and that is visible via this prometheus metric: metallb_speaker_announced

This metric will show the following info: metallb_speaker_announced{app_kubernetes_io_component="speaker", app_kubernetes_io_instance="metallb-system", app_kubernetes_io_name="metallb", instance="10.147.52.129:7472", ip="192.168.1.21", job="kubernetes-pods", kubernetes_namespace="metallb", kubernetes_node_name="node01", kubernetes_pod_name="metallb-system-spk-5whj5", node="node01", protocol="layer2", service="metallb/service-1"}

How would the PromQL expression would look like if we wanted to detect if an IP has been announced at least 3 times from at least 2 different nodes in the last 5 minutes?

To complete information for better context, metallb_speaker_announced events are triggered by different type of events and they are harmless as long as the kubernetes node making the announcement is the same. IF, the kubernetes node making the announcment alternates, that is a relevant problem that could be the consequence of things like the node having a flapping NIC or other conditions.

1

There are 1 best solutions below

1
DazWilkin On

I'm unable to repro your example as I don't have MetalLB and a bunch of nodes but...

If we can assume that metallb_speaker_announced only triggers on a new node, the first firing will be the 1st node and the second firing will be a different, 2nd node. Any subsequent firings e.g. 3rd is either from the 1st node again or from a 3rd node. So, 2+ firings is guaranteed to be >=2 nodes.

Then, I think you can sum_over_time(metallb_speaker_announced{}[5m) to sum all the announcements for the last 5 minutes.

And then you can sum by(ip) (sum_over_time(metallb_speaker_announced{}[5m)) to get the results summarized by ip.

And then you can sum by(ip) (sum_over_time(metallb_speaker_announced{}[5m)) >= 3 to filter the results by those ip's that occurred >=3 times.