I have a a few images a tibco , weblogic & spring boot etc that suddenly started failing on some of our linux server after a patching cycle because they cant determine the hostname of the container.
I have a docker swarm with one master and one worker. If I deploy these services on both nodes, the one starts successful and on the other I get an unknown host error infact any image that somehow references localhost fails on my worker machine.
It appears that on the machine that it fails on the user cant view the /etc/resolv.conf file and the /etc/host file and because of this ping localhost isn't working. But I have no idea how to fix this, and because it works on some server and other not I don't think its a code issue
Error on tibco container
Version 7.0.1 V4 2/27/2013
2018-09-30 11:40:01.095 FATAL: Could not resolve hostname '5802dab65aea'. Possibly default hostname is not configured properly while multiple network interfaces are present.
2018-09-30 11:40:01.095 FATAL: Exception in startup, exiting.
Exception on weblogic domain
Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: d718d565dee5: d718d565dee5: Temporary failure in name resolution
Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: d718d565dee5: d718d565dee5: Temporary failure in name resolution
Stopping Derby server...
logging into the container on the host that is faulty
sh-4.2$ hostname
b73fe493e913
sh-4.2$ ping b73fe493e913
ping: unknown host b73fe493e913
sh-4.2$ ping localhost
ping: unknown host localhost
sh-4.2$ cat /etc/hosts
cat: /etc/hosts: Permission denied
sh-4.2$ cat /etc/resolv.conf
cat: /etc/resolv.conf: Permission denied
sh-4.2$ ls -ltr
-rw-r-----+ 1 root root 174 Sep 30 13:20 hosts
-rw-r-----+ 1 root root 13 Sep 30 13:20 hostname
-rw-r-----+ 1 root root 148 Sep 30 13:20 resolv.conf
Logging into the container on the working machine
sh-4.2$ hostname
2925d3058c7f
sh-4.2$ ping 2925d3058c7f
ping: icmp open socket: Operation not permitted
sh-4.2$ ping localhost
ping: icmp open socket: Operation not permitted
sh-4.2$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
10.0.0.252 2925d3058c7f
sh-4.2$ cat /etc/resolv.conf
search *.co.za *.holdings.co.za **.co.za *.corp *.com *.co.za
nameserver 127.0.0.11
options ndots:0
sh-4.2$ ls -ltr
-rw-r--r--. 1 root root 174 Sep 30 08:48 hosts
-rw-r--r--. 1 root root 13 Sep 30 08:48 hostname
-rw-r--r--. 1 root root 148 Sep 30 08:48 resolv.conf
Docker info
Containers: 112
Running: 18
Paused: 0
Stopped: 94
Images: 388
Server Version: 18.06.1-ce
Storage Driver: overlay
Backing Filesystem: xfs
Supports d_type: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: nfg2fjye8i8ub1cx0jmgkb75x
Is Manager: false
Node Address: 172.22.141.179
Manager Addresses:
172.30.10.35:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.11.6.el7.x86_64
Operating System: Red Hat Enterprise Linux
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.74GiB
Name: #######
ID: O23F:WZTF:GV4Z:7WXU:3BI6:TY46:MIMR:JW6M:XPG4:XNWI:TO7H:CNZB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Docker version
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:03 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:25:29 2018
OS/Arch: linux/amd64
Experimental: false
It was indeed it The linux administrator did a ls -ald on /var/lib/docker/* And found the /var/lib/docker/containers directory was marked wit a drwxr-x---+ When we removed it the issue was resolved