Docker-CE 19.03.8 Swarm init Setup: 1 Manager Node nothing more.
We deploy many new stacks per day and sometime i see the following line:
evel=error msg="Failed to allocate network resources for node sdlk0t6pyfb7lxa2ie3w7fdzr" error="could not find network allocator state for network qnkxurc5etd2xrkb53ry0fu59" module=node node.id=yp0u6n9c31yh3xyekondzr4jc
After 2 to 3 days. No new services can be started because there are no free VIPs. I see the following line in my logs:
level=error msg="Could not parse VIP address while releasing"
level=error msg="error deallocating vip" error="invalid CIDR address: " vip.addr= vip.network=oqcsj99taftdu3b0t3nrgbgy1
level=error msg="Event api.EventUpdateTask: Failed to get service idid0u7vjuxf2itpv8n31da57 for task 6vnc8jdkgxwxqbs3ixly2i6u4 state NEW: could not find service idid0u7vjuxf2itpv8n31da57" module=node ...
level=error msg="Event api.EventUpdateTask: Failed to get service sbjb7nk0wk31c2ayg8x898fhr for task noo21whnbwkyijnqavseirfg0 state NEW: could not find service sbjb7nk0wk31c2ayg8x898fhr" module=node ...
level=error msg="Failed to find network y73pnq85mjpn1pon38pdbtaw2 on node sdlk0t6pyfb7lxa2ie3w7fdzr" module=node node.id=yp0u6n9c31yh3xyekondzr4jc
We tried to investigate this by using the debug mode. Here are some lines that get to me:
level=debug msg="Remove interface veth84e7185 failed: Link not found"
level=debug msg="Remove interface veth64c3a65 failed: Link not found"
level=debug msg="Remove interface vethf1703f1 failed: Link not found"
level=debug msg="Remove interface vethe069254 failed: Link not found"
level=debug msg="Remove interface veth2b81763 failed: Link not found"
level=debug msg="Remove interface veth0bf3390 failed: Link not found"
level=debug msg="Remove interface veth2ed04cc failed: Link not found"
level=debug msg="Remove interface veth0bc27ef failed: Link not found"
level=debug msg="Remove interface veth444343f failed: Link not found"
level=debug msg="Remove interface veth036acf9 failed: Link not found"
level=debug msg="Remove interface veth62d7977 failed: Link not found"
and
level=debug msg="Request address PoolID:10.0.0.0/24 App: ipam/default/data, ID: GlobalDefault/10.0.0.0/24, DBIndex: 0x0, Bits: 256, Unselected: 60, Sequence: (0xf7dfeeee, 1)->(0xedddddb7, 1)->(0x77777777, 3)->(0x77777775, 1)->(0x77ffffff, 1)->(0xffd55555, 1)->end Curr:233 Serial:true PrefAddress:<
When the UNSELECTED part goes to 0 no new containers can be deployed. They are stuck in the NEW state.
Has anyone expirenced something like this? Or can someone help me? We believe, that the problem has to do something with the release of the 10.0.0.0/24 (our ingress) addresses.
If you see your container stuck in NEW state, probably your are affected by this problem: https://github.com/moby/moby/issues/37338 reported by cintiadr: