I'm facing a situation where my application hangs at startup because of a "deadlock" situation related to InetAddress.getByName
but it's not clear to me what would be the way to fix it.
To give some context, the 2 threads involved are not directly in my control:
- 1 thread (BLOCKED) is starting a Prometheus HTTP server
- 1 thread (RUNNABLE) is ZIO Http client library related, calling some Netty stuff as a client (not server)
The relevant code of the 1st thread is:
new InetSocketAddress("0.0.0.0", somePort)
And the second:
static final InetAddress INET6_ANY = InetAddress.getByName("::")
static final InetAddress INET_ANY = InetAddress.getByName("0.0.0.0")
I've read that using InetAddress
may involve some blocking etc.. but why would it hangs forever? Especially as we're referring to the local address 0.0.0.0
and not some remote address.
This app is running in a container in Kubernetes if this could explain something.
$ cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.42.126.184 my-app-756f44d67-tgr5b
Note that this is not always reproducible but we've seen several occurrences lately.
Could this be a "bug" as in somehow a misusage of the libraries? Or am I maybe missing something that must be defined for such code to work in a Kubernetes context?
For completeness, here is the thread dump for these 2 threads.
The one BLOCKED:
"ZScheduler-Worker-6" #30 daemon prio=5 os_prio=0 cpu=276.23ms elapsed=7541.20s tid=0x00007f7bd54d4880 nid=0x55 waiting for monitor entry [0x00007f7b663f4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
- waiting to lock <0x00000000a02e9a90> (a java.util.HashSet)
at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
at jdk.internal.loader.NativeLibraries.findFromPaths([email protected]/Unknown Source)
at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
at jdk.internal.loader.BootLoader.loadLibrary([email protected]/Unknown Source)
at java.net.InetAddress.<clinit>([email protected]/Unknown Source)
at java.net.InetSocketAddress.<init>([email protected]/Unknown Source)
at io.prometheus.metrics.exporter.httpserver.HTTPServer$Builder.makeInetSocketAddress(HTTPServer.java:209)
at io.prometheus.metrics.exporter.httpserver.HTTPServer$Builder.buildAndStart(HTTPServer.java:197)
at io.opentelemetry.exporter.prometheus.PrometheusHttpServer.<init>(PrometheusHttpServer.java:71)
at io.opentelemetry.exporter.prometheus.PrometheusHttpServerBuilder.build(PrometheusHttpServerBuilder.java:68)
at com.myapp.metrics.sdk.PrometheusMetricReader$.$anonfun$startReader$2(PrometheusMetricReader.scala:21)
at com.myapp.metrics.sdk.PrometheusMetricReader$$$Lambda$1109/0x00007f7b7840e078.apply(Unknown Source)
at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:100)
at zio.ZIOCompanionVersionSpecific$$Lambda$430/0x00007f7b782ba000.apply(Unknown Source)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:904)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
at zio.internal.ZScheduler$$anon$4.run(ZScheduler.scala:478)
The one "locking":
"ZScheduler-Worker-20" #44 daemon prio=5 os_prio=0 cpu=191.27ms elapsed=7541.20s tid=0x00007f7bd54e2e70 nid=0x63 in Object.wait() [0x00007f7b655e3000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.LinuxSocket.unsafeInetAddrByName(LinuxSocket.java:364)
- waiting on the Class initialization monitor for java.net.InetAddress
at io.netty.channel.epoll.LinuxSocket.<clinit>(LinuxSocket.java:42)
at jdk.internal.loader.NativeLibraries.load([email protected]/Native Method)
at jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open([email protected]/Unknown Source)
at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
- locked <0x00000000a02e9a90> (a java.util.HashSet)
at jdk.internal.loader.NativeLibraries.loadLibrary([email protected]/Unknown Source)
at java.lang.ClassLoader.loadLibrary([email protected]/Unknown Source)
at java.lang.Runtime.load0([email protected]/Unknown Source)
at java.lang.System.load([email protected]/Unknown Source)
at io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:36)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0([email protected]/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke([email protected]/Unknown Source)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/Unknown Source)
at java.lang.reflect.Method.invoke([email protected]/Unknown Source)
at io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:430)
at java.security.AccessController.executePrivileged([email protected]/Unknown Source)
at java.security.AccessController.doPrivileged([email protected]/Unknown Source)
at io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:422)
at io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:388)
at io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:218)
at io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:323)
at io.netty.channel.epoll.Native.<clinit>(Native.java:85)
at io.netty.channel.epoll.Epoll.<clinit>(Epoll.java:40)
at zio.http.netty.ChannelFactories$Client$.$anonfun$fromConfig$4(ChannelFactories.scala:83)
at zio.http.netty.ChannelFactories$Client$$$Lambda$966/0x00007f7b783d3e60.apply(Unknown Source)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967)
at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
at zio.internal.ZScheduler$$anon$4.run(ZScheduler.scala:478)
Update
This has been fixed in the Netty repository and will be included in Netty
4.1.108.Final
.What happens
From your thread dumps, I can see the following:
InetAddress
waits for a native library to be loaded but it cannot because another thread is loading a native library.InetAddress
as part of loading a native library. Specifically, the class initializer of Netty'sNative
class loads a native library (netty_transport_native_epoll
) which in turn makes an upcall toLinuxSocket
(or at least initializes it) which requiresInetAddress
to be loaded.So the problem is that Netty uses
InetAddress
while loading a native library which can occur during initialization.A workaround
You can make sure that
InetAddress
is fully initialized before giving Netty a chance to do anything. You can do that by runningInetAddress.getLocalHost();
at the beginning of your main. That should be before Netty is used anywhere and it should initializeInetAddress
Actually solving the problem
You can file a bugreport to the Netty team (or even write a pull request yourself).
One solution they could implement is to initialize
InetAddress
themselves before loading native libraries (that rely on it being loaded/loadable).For example, they could add a
InetAddress.getLocalHost();
call into theNative
class before actually loading stuff (e.g. at the beginning ofNative.loadNativeLibrary
).Alternatively, it might even possible to change something that loading the native library doesn't require
InetAddress
at all. However, I don't have sufficient knowledge about Netty (internals) to be able to judge that.