Is there a way to know what is causing a memory leak on a docker swarm?

966 Views Asked by At

We are running a docker swarm and using Monit to see resources utilisation. The Process memory for dockerd keeps on growing over time. This happens on all nodes that at least perform a docker action e.g docker inspect or docker exec. I'm suspecting it might be something related to this these actions but I'm not sure how to replicate it. I have a script like

#!/bin/sh
set -eu

containers=$(docker container ls | awk '{if(NR>1) print $NF}')

# Loop forever
while true; 
do    
    for container in $containers; do
        echo "Running Inspect on $container"
        CONTAINER_STATUS="$(docker inspect $container -f "{{.State}}")"
    done
done

but I'm open to other suggestions

1

There are 1 best solutions below

4
Chris Becke On

Assuming you can run ansible to run a command via ssh on all servers:

ansible swarm -a "docker stats --no-stream"

A more SRE solution is containerd + Prometheus + AlerManager / Grafana to gather metrics from the swarm nodes and then implement alerting when container thresholds are exceeded.


Don't forget you can simply set a resource constraint on Swarm services to limit the amount of memory and cpu service tasks can consume or be restarted. Then just look for services that keep getting OOM killed.