One of containers intermittently can NOT access another container in swarm also

125 Views Asked by At

I'm using docker swarm(17.06 CE) to orchestrate my micro services. The swarm cluster has 3 managers and 1 worker.

I having a Nginx image running in swarm managers globally. I have a Java based micro services having 2 replicas in the same overlay network.

Now I found that one of Nginx containers can NOT access the micro service. The other two Nginx containers can access the service without problem.

### there are three nginx containers in swarm  
➜  ~ docker service ps pilipa-prod-nginx 
ID                  NAME                             IMAGE                                             NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS 
qufld0uu8tk9        pilipa-prod-nginx.4r2p0t892qn55n4uewoymxbp0   registry.i-counting.cn/pilipa/prod/nginx:latest   node02              Running             Running 21 hours ago 
bwjw9c9dm8e1        pilipa-prod-nginx.ixw4urfkdcnkm326vgkw92x8n   registry.i-counting.cn/pilipa/prod/nginx:latest   node01              Running             Running 21 hours ago 
2w2gg83xt6g4        pilipa-prod-nginx.5t63dl8dcj603iyw5l5vv0xvx   registry.i-counting.cn/pilipa/prod/nginx:latest   node03              Running             Running 21 hours ago

### log in the normal Nginx, it can access the micro service without problem  
➜  ~ docker exec --interactive --tty pilipa-prod-nginx.4r2p0t892qn55n4uewoymxbp0.qufld0uu8tk9ieubcimed8fgw 
sh / # ip addr show 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever 10901: eth0@if10902: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:0a:00:00:2c brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.44/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.0.0.11/32 scope global eth0
       valid_lft forever preferred_lft forever 10903: eth1@if10904: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:13:00:09 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.9/16 scope global eth1
       valid_lft forever preferred_lft forever 
 / # wget 10.0.0.71:8080 Connecting to 10.0.0.71:8080 (10.0.0.71:8080) wget: server returned error: HTTP/1.1 401 Unauthorized

### log in the problematic Nginx container which can ping the host of micro service, but can NOT access the service
➜  ~ docker exec --interactive --tty pilipa-prod-nginx.ixw4urfkdcnkm326vgkw92x8n.bwjw9c9dm8e1qlx64z5sniw7h sh
/ #
/ #
/ # wget 10.0.0.71:80
Connecting to 10.0.0.71:80 (10.0.0.71:80)
wget: can't connect to remote host (10.0.0.71): Connection refused
/ # ping 10.0.0.71
PING 10.0.0.71 (10.0.0.71): 56 data bytes
64 bytes from 10.0.0.71: seq=0 ttl=64 time=0.066 ms
64 bytes from 10.0.0.71: seq=1 ttl=64 time=0.076 ms
64 bytes from 10.0.0.71: seq=2 ttl=64 time=0.073 ms
^C
--- 10.0.0.71 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.066/0.071/0.076 ms

Upate:

I tried to use tcpdump to capture the traffic in the micro service's host. I could capture the traffics from the normal Nginx container when using ping 10.0.0.71 and wget 10.0.0.71:8080 to access the service. However there was no traffic captured either ping or wget from the problematic Nginx container!

Is it a known bug of swarm's overlay network or something misconfiguration in my env?

0

There are 0 best solutions below