I'm playing around with Docker swarm,
I have a three nodes cluster, 1 manager, and 2 worker nodes. I'm using VIP
for all my services.
I had a weird situation where I restarted the worker node.
I executed docker node ls
and the worker node was Ready
.
docker service ls
would show me that the replications of the containers in the worker were good.
The problem: I couldn't join the node though the ingress network. No container in other nodes was able to access a container in that worker node.
I checked the containers they were all joining the ingress network.
I curled the containers from within the same node and they responded.
I pinged the service name (in the same malfunctioning node) from a container and it worked.
I curled the worker containers in the worker from the manager doesn't work!!
I curled with the ip address of the worker and they responded.
I restarted the worker node, but the issue persisted, then I restarted the whole cluster and it worked again!
Is there any explanation to what I just witnessed ?
I'm most worried that this would happen in a production environnement.
Thank you in advance.
This happens when overlay networking ports are not opened between the nodes (both workers and managers). From Docker's documentation, the following ports need to be opened:
This may be blocked by iptables on either end, a network router/firewall in the middle, and even tools like VMWare NSX. To verify connectivity is working end to end, you can run tcpdump on the selected ports at each node and ensure that requests leaving one node arrive at the other.
Relevant iptables rules for every node in the cluster are:
If you are unable to adjust the firewall settings, swarm mode may be configured with a different overlay networking port from 4789 with
docker swarm init --data-path-port