The container_memory_filcnt does not count the number of times that pods are OOMkilled?

959 Views Asked by At

Situation :
There are two containers in the prometheus pod ( config-reloader, prometheus )
I set the resources.limits.memory as 50Mi, 32Gi respectively.

The metric container_memory_failcnt has been increased dramatically from 10 to 8000 within 5 minutes (precisely rate(container_memory_failcnt{}[5m]) )

The mertic container_memory_failcnt tells how many times the container hits memory limit.

But according to the metric container_memory_working_set_bytes, the prometheus container used 18Gi of memory.
The pod is not killed by OOM neither. But the metric container_memory_failcnt increased dramatically.

OOM is different from hitting memory limit?

And I want to know some candidates why the prometheus container used the memory(18Gi) so much within 5 minutes. (It usually uses 10Gi or below )

1

There are 1 best solutions below

0
On

After searching on google 2 days, I got to know the answer.
The container_memory_failcnt really checks whether the target container hits memory limits.
This metric is along with the container_memory_usage_bytes.

And the metric container_memory_working_set_bytes is the actual memory size which the container currently is using.
And the oom killer is watching this metrics.

So in my case, the container_memory_failcnt had been increasing but the container_memory_working_set_bytes was lower than the limits.memory of the container so that the pod was not oom killed.

special thanks to Bob cotton

https://faun.pub/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d