Thread BLOCKED on InetAddress.getByName 0.0.0.0

111 Views Asked by At

I'm facing a situation where my application hangs at startup because of a "deadlock" situation related to InetAddress.getByName but it's not clear to me what would be the way to fix it.

To give some context, the 2 threads involved are not directly in my control:

  • 1 thread (BLOCKED) is starting a Prometheus HTTP server
  • 1 thread (RUNNABLE) is ZIO Http client library related, calling some Netty stuff as a client (not server)

The relevant code of the 1st thread is:

new InetSocketAddress("0.0.0.0", somePort)

And the second:

static final InetAddress INET6_ANY = InetAddress.getByName("::")
static final InetAddress INET_ANY = InetAddress.getByName("0.0.0.0")

I've read that using InetAddress may involve some blocking etc.. but why would it hangs forever? Especially as we're referring to the local address 0.0.0.0 and not some remote address.

This app is running in a container in Kubernetes if this could explain something.

$ cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.42.126.184   my-app-756f44d67-tgr5b

Note that this is not always reproducible but we've seen several occurrences lately.

Could this be a "bug" as in somehow a misusage of the libraries? Or am I maybe missing something that must be defined for such code to work in a Kubernetes context?


For completeness, here is the thread dump for these 2 threads.

The one BLOCKED:

"ZScheduler-Worker-6" #30 daemon prio=5 os_prio=0 cpu=276.23ms elapsed=7541.20s tid=0x00007f7bd54d4880 nid=0x55 waiting for monitor entry  [0x00007f7b663f4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
    - waiting to lock <0x00000000a02e9a90> (a java.util.HashSet)
    at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
    at jdk.internal.loader.NativeLibraries.findFromPaths([email protected]/Unknown Source)
    at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
    at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
    at jdk.internal.loader.BootLoader.loadLibrary([email protected]/Unknown Source)
    at java.net.InetAddress.<clinit>([email protected]/Unknown Source)
    at java.net.InetSocketAddress.<init>([email protected]/Unknown Source)
    at io.prometheus.metrics.exporter.httpserver.HTTPServer$Builder.makeInetSocketAddress(HTTPServer.java:209)
    at io.prometheus.metrics.exporter.httpserver.HTTPServer$Builder.buildAndStart(HTTPServer.java:197)
    at io.opentelemetry.exporter.prometheus.PrometheusHttpServer.<init>(PrometheusHttpServer.java:71)
    at io.opentelemetry.exporter.prometheus.PrometheusHttpServerBuilder.build(PrometheusHttpServerBuilder.java:68)
    at com.myapp.metrics.sdk.PrometheusMetricReader$.$anonfun$startReader$2(PrometheusMetricReader.scala:21)
    at com.myapp.metrics.sdk.PrometheusMetricReader$$$Lambda$1109/0x00007f7b7840e078.apply(Unknown Source)
    at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:100)
    at zio.ZIOCompanionVersionSpecific$$Lambda$430/0x00007f7b782ba000.apply(Unknown Source)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:904)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
    at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
    at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
    at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
    at zio.internal.ZScheduler$$anon$4.run(ZScheduler.scala:478)

The one "locking":

"ZScheduler-Worker-20" #44 daemon prio=5 os_prio=0 cpu=191.27ms elapsed=7541.20s tid=0x00007f7bd54e2e70 nid=0x63 in Object.wait()  [0x00007f7b655e3000]
   java.lang.Thread.State: RUNNABLE
    at io.netty.channel.epoll.LinuxSocket.unsafeInetAddrByName(LinuxSocket.java:364)
    - waiting on the Class initialization monitor for java.net.InetAddress
    at io.netty.channel.epoll.LinuxSocket.<clinit>(LinuxSocket.java:42)
    at jdk.internal.loader.NativeLibraries.load([email protected]/Native Method)
    at jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open([email protected]/Unknown Source)
    at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
    - locked <0x00000000a02e9a90> (a java.util.HashSet)
    at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
    at java.lang.ClassLoader.loadLibrary([email protected]/Unknown Source)
    at java.lang.Runtime.load0([email protected]/Unknown Source)
    at java.lang.System.load([email protected]/Unknown Source)
    at io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:36)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0([email protected]/Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke([email protected]/Unknown Source)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/Unknown Source)
    at java.lang.reflect.Method.invoke([email protected]/Unknown Source)
    at io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:430)
    at java.security.AccessController.executePrivileged([email protected]/Unknown Source)
    at java.security.AccessController.doPrivileged([email protected]/Unknown Source)
    at io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:422)
    at io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:388)
    at io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:218)
    at io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:323)
    at io.netty.channel.epoll.Native.<clinit>(Native.java:85)
    at io.netty.channel.epoll.Epoll.<clinit>(Epoll.java:40)
    at zio.http.netty.ChannelFactories$Client$.$anonfun$fromConfig$4(ChannelFactories.scala:83)
    at zio.http.netty.ChannelFactories$Client$$$Lambda$966/0x00007f7b783d3e60.apply(Unknown Source)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
    at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
    at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
    at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
    at zio.internal.ZScheduler$$anon$4.run(ZScheduler.scala:478)
1

There are 1 best solutions below

3
On BEST ANSWER

Update

This has been fixed in the Netty repository and will be included in Netty 4.1.108.Final.

What happens

From your thread dumps, I can see the following:

  • In the blocked thread, the class initializer of InetAddress waits for a native library to be loaded but it cannot because another thread is loading a native library.
  • The other thread attempts to use InetAddress as part of loading a native library. Specifically, the class initializer of Netty's Native class loads a native library (netty_transport_native_epoll) which in turn makes an upcall to LinuxSocket (or at least initializes it) which requires InetAddress to be loaded.

So the problem is that Netty uses InetAddress while loading a native library which can occur during initialization.

A workaround

You can make sure that InetAddress is fully initialized before giving Netty a chance to do anything. You can do that by running InetAddress.getLocalHost(); at the beginning of your main. That should be before Netty is used anywhere and it should initialize InetAddress

Actually solving the problem

You can file a bugreport to the Netty team (or even write a pull request yourself).

One solution they could implement is to initialize InetAddress themselves before loading native libraries (that rely on it being loaded/loadable).

For example, they could add a InetAddress.getLocalHost(); call into the Native class before actually loading stuff (e.g. at the beginning of Native.loadNativeLibrary).

Alternatively, it might even possible to change something that loading the native library doesn't require InetAddress at all. However, I don't have sufficient knowledge about Netty (internals) to be able to judge that.