Wildfly/JGroups DNS_Ping discovery mechanism seems to leak threads

129 Views Asked by At

We are currently facing a problem in our setup with a Wildfly/JGroups cluster in a Kubernetes environment. We have a varying amount of Wildfly (30.0.0) nodes that need to communicate with each other and form a cluster for ArtemisMQ JMS message handling. We are using dns.DNS_PING for discovery in the cluster and TCP as the main protocol for JGroups.

We use the following Wildfly CLI commands to setup the JGroups cluster:

`echo "Kubernetes interface and bindings"/interface=kubernetes:add(nic=eth0)/interface=private:add(inet-address="${jboss.bind.address.private:127.0.0.1}")/interface=dns:add(site-local-address=true)/socket-binding-group=standard-sockets/socket-binding=jgroups-tcp:add(interface=dns, port=7800)/socket-binding-group=standard-sockets/socket-binding=jgroups-tcp-fd:add(interface=dns, port=57800)/socket-binding-group=standard-sockets/socket-binding=http:write-attribute(name=interface,value=dns)/socket-binding-group=standard-sockets/socket-binding=https:write-attribute(name=interface,value=dns)
echo "JGroups"/extension=org.jboss.as.clustering.jgroups:add()/subsystem=jgroups:add()#/subsystem=jgroups:write-attribute(name=default-stack,value=tcp)
echo "TCP stack"batch/subsystem=jgroups/stack=tcp:add()#/subsystem=jgroups/stack=tcp:add/subsystem=jgroups/stack=tcp/transport=TCP:add(socket-binding=jgroups-tcp)/subsystem=jgroups/stack=tcp/protocol=MERGE3:add/subsystem=jgroups/stack=tcp/protocol=FD_SOCK:add(socket-binding=jgroups-tcp-fd)/subsystem=jgroups/stack=tcp/protocol=VERIFY_SUSPECT:add/subsystem=jgroups/stack=tcp/protocol=pbcast.NAKACK2:add/subsystem=jgroups/stack=tcp/protocol=UNICAST3:add/subsystem=jgroups/stack=tcp/protocol=pbcast.STABLE:add/subsystem=jgroups/stack=tcp/protocol=pbcast.GMS:add/subsystem=jgroups/stack=tcp/protocol=MFC:add/subsystem=jgroups/stack=tcp/protocol=FRAG3:addrun-batch
echo "JGroups Channel"/subsystem=jgroups/channel=ee:add(stack=tcp)/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)#/subsystem=jgroups/channel=ee:write-attribute(name=cluster,value=kubernetes)/subsystem=jgroups:write-attribute(name=default-channel,value=ee)
echo "DNS_PING Protocol"/subsystem=jgroups/stack=tcp/protocol=dns.DNS_PING:add(add-index=0,properties={dns_query="_ping._tcp.avaloq-wb-sync-manager-ping.namespace001.svc.cluster.local.",dns_record_type=SRV})`

The DNS_PING query points to a Kubernetes service that exposes the nodes we want to have in the cluster.

Now on a productive deployment we are getting massive amount of threads created by DNS_PING. We also see, that one thread is blocking the others and is hanging in the "PlainSocket.socketConnect" method. We have sock_conn_timeout set to 300 milliseconds for JGroups, so this wait should not really happen.
In the end, Wildfly is not able to start any more threads (OS level threads cannot be created any more). We are still unsure what exactly causes this problem, but we assume it might be the file descriptor limit being reached. In the end we have around 4000 threads, of which approximately 75% are DNS-Ping related.

The hanging thread looks like that:

        {
            "thread-id" => 109424945L,
            "thread-name" => "Timer temp thread-20460,ee,avaloq-wb-sync-manager-0",
            "thread-state" => "RUNNABLE",
            "blocked-time" => -1L,
            "blocked-count" => 1L,
            "waited-time" => -1L,
            "waited-count" => 1L,
            "lock-info" => undefined,
            "lock-name" => undefined,
            "lock-owner-id" => -1L,
            "lock-owner-name" => undefined,
            "stack-trace" => [
                {
                    "file-name" => "PlainSocketImpl.java",
                    "line-number" => -2,
                    "class-name" => "java.net.PlainSocketImpl",
                    "method-name" => "socketConnect",
                    "native-method" => true
                },
                {
                    "file-name" => "AbstractPlainSocketImpl.java",
                    "line-number" => 412,
                    "class-name" => "java.net.AbstractPlainSocketImpl",
                    "method-name" => "doConnect",
                    "native-method" => false
                },
                {
                    "file-name" => "AbstractPlainSocketImpl.java",
                    "line-number" => 255,
                    "class-name" => "java.net.AbstractPlainSocketImpl",
                    "method-name" => "connectToAddress",
                    "native-method" => false
                },
                {
                    "file-name" => "AbstractPlainSocketImpl.java",
                    "line-number" => 237,
                    "class-name" => "java.net.AbstractPlainSocketImpl",
                    "method-name" => "connect",
                    "native-method" => false
                },
                {
                    "file-name" => "SocksSocketImpl.java",
                    "line-number" => 392,
                    "class-name" => "java.net.SocksSocketImpl",
                    "method-name" => "connect",
                    "native-method" => false
                },
                {
                    "file-name" => "Socket.java",
                    "line-number" => 609,
                    "class-name" => "java.net.Socket",
                    "method-name" => "connect",
                    "native-method" => false
                },
                {
                    "file-name" => "Util.java",
                    "line-number" => 461,
                    "class-name" => "org.jgroups.util.Util",
                    "method-name" => "connect",
                    "native-method" => false
                },
                {
                    "file-name" => "TcpConnection.java",
                    "line-number" => 96,
                    "class-name" => "org.jgroups.blocks.cs.TcpConnection",
                    "method-name" => "connect",
                    "native-method" => false
                },
                {
                    "file-name" => "TcpConnection.java",
                    "line-number" => 88,
                    "class-name" => "org.jgroups.blocks.cs.TcpConnection",
                    "method-name" => "connect",
                    "native-method" => false
                },
                {
                    "file-name" => "BaseServer.java",
                    "line-number" => 295,
                    "class-name" => "org.jgroups.blocks.cs.BaseServer",
                    "method-name" => "getConnection",
                    "native-method" => false
                },
                {
                    "file-name" => "BaseServer.java",
                    "line-number" => 208,
                    "class-name" => "org.jgroups.blocks.cs.BaseServer",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "TCP.java",
                    "line-number" => 91,
                    "class-name" => "org.jgroups.protocols.TCP",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "BasicTCP.java",
                    "line-number" => 146,
                    "class-name" => "org.jgroups.protocols.BasicTCP",
                    "method-name" => "sendUnicast",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1638,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "sendToSingleMember",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1632,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "doSend",
                    "native-method" => false
                },
                {
                    "file-name" => "NoBundler.java",
                    "line-number" => 38,
                    "class-name" => "org.jgroups.protocols.NoBundler",
                    "method-name" => "sendSingleMessage",
                    "native-method" => false
                },
                {
                    "file-name" => "NoBundler.java",
                    "line-number" => 30,
                    "class-name" => "org.jgroups.protocols.NoBundler",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1620,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1353,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "_send",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1262,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "down",
                    "native-method" => false
                },
                {
                    "file-name" => "DNS_PING.java",
                    "line-number" => 189,
                    "class-name" => "org.jgroups.protocols.dns.DNS_PING",
                    "method-name" => "sendDiscoveryRequest",
                    "native-method" => false
                },
                {
                    "file-name" => "DNS_PING.java",
                    "line-number" => 182,
                    "class-name" => "org.jgroups.protocols.dns.DNS_PING",
                    "method-name" => "findMembers",
                    "native-method" => false
                },
                {
                    "file-name" => "Discovery.java",
                    "line-number" => 217,
                    "class-name" => "org.jgroups.protocols.Discovery",
                    "method-name" => "invokeFindMembers",
                    "native-method" => false
                },
                {
                    "file-name" => "Discovery.java",
                    "line-number" => 228,
                    "class-name" => "org.jgroups.protocols.Discovery",
                    "method-name" => "lambda$findMembers$0",
                    "native-method" => false
                },
                {
                    "file-name" => undefined,
                    "line-number" => -1,
                    "class-name" => "org.jgroups.protocols.Discovery$$Lambda$968/0x0000000840b0bc40",
                    "method-name" => "run",
                    "native-method" => false
                },
                {
                    "file-name" => "TimeScheduler3.java",
                    "line-number" => 324,
                    "class-name" => "org.jgroups.util.TimeScheduler3$Task",
                    "method-name" => "run",
                    "native-method" => false
                },
                {
                    "file-name" => "ContextReferenceExecutor.java",
                    "line-number" => 49,
                    "class-name" => "org.jboss.as.clustering.context.ContextReferenceExecutor",
                    "method-name" => "execute",
                    "native-method" => false
                },
                {
                    "file-name" => "ContextualExecutor.java",
                    "line-number" => 70,
                    "class-name" => "org.jboss.as.clustering.context.ContextualExecutor$1",
                    "method-name" => "run",
                    "native-method" => false
                },
                {
                    "file-name" => "Thread.java",
                    "line-number" => 829,
                    "class-name" => "java.lang.Thread",
                    "method-name" => "run",
                    "native-method" => false
                }
            ],
            "suspended" => false,
            "in-native" => false,
            "locked-monitors" => [{
                "class-name" => "java.net.SocksSocketImpl",
                "identity-hash-code" => 139076230,
                "locked-stack-depth" => 1,
                "locked-stack-frame" => {
                    "file-name" => "AbstractPlainSocketImpl.java",
                    "line-number" => 412,
                    "class-name" => "java.net.AbstractPlainSocketImpl",
                    "method-name" => "doConnect",
                    "native-method" => false
                }
            }],
            "locked-synchronizers" => [{
                "class-name" => "java.util.concurrent.locks.ReentrantLock$FairSync",
                "identity-hash-code" => 740591308
            }]
        },

And a typical waiting thread:

       "thread-id" => 109424946L,
            "thread-name" => "Timer temp thread-20461,ee,avaloq-wb-sync-manager-0",
            "thread-state" => "WAITING",
            "blocked-time" => -1L,
            "blocked-count" => 1L,
            "waited-time" => -1L,
            "waited-count" => 1L,
            "lock-info" => {
                "class-name" => "java.util.concurrent.locks.ReentrantLock$FairSync",
                "identity-hash-code" => 740591308
            },
            "lock-name" => "java.util.concurrent.locks.ReentrantLock$FairSync@2c2486cc",
            "lock-owner-id" => 109424945L,
            "lock-owner-name" => "Timer temp thread-20460,ee,avaloq-wb-sync-manager-0",
            "stack-trace" => [
                {
                    "file-name" => "Unsafe.java",
                    "line-number" => -2,
                    "class-name" => "jdk.internal.misc.Unsafe",
                    "method-name" => "park",
                    "native-method" => true
                },
                {
                    "file-name" => "LockSupport.java",
                    "line-number" => 194,
                    "class-name" => "java.util.concurrent.locks.LockSupport",
                    "method-name" => "park",
                    "native-method" => false
                },
                {
                    "file-name" => "AbstractQueuedSynchronizer.java",
                    "line-number" => 885,
                    "class-name" => "java.util.concurrent.locks.AbstractQueuedSynchronizer",
                    "method-name" => "parkAndCheckInterrupt",
                    "native-method" => false
                },
                {
                    "file-name" => "AbstractQueuedSynchronizer.java",
                    "line-number" => 943,
                    "class-name" => "java.util.concurrent.locks.AbstractQueuedSynchronizer",
                    "method-name" => "doAcquireInterruptibly",
                    "native-method" => false
                },
                {
                    "file-name" => "AbstractQueuedSynchronizer.java",
                    "line-number" => 1263,
                    "class-name" => "java.util.concurrent.locks.AbstractQueuedSynchronizer",
                    "method-name" => "acquireInterruptibly",
                    "native-method" => false
                },
                {
                    "file-name" => "ReentrantLock.java",
                    "line-number" => 317,
                    "class-name" => "java.util.concurrent.locks.ReentrantLock",
                    "method-name" => "lockInterruptibly",
                    "native-method" => false
                },
                {
                    "file-name" => "BaseServer.java",
                    "line-number" => 277,
                    "class-name" => "org.jgroups.blocks.cs.BaseServer",
                    "method-name" => "getConnection",
                    "native-method" => false
                },
                {
                    "file-name" => "BaseServer.java",
                    "line-number" => 208,
                    "class-name" => "org.jgroups.blocks.cs.BaseServer",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "TCP.java",
                    "line-number" => 91,
                    "class-name" => "org.jgroups.protocols.TCP",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "BasicTCP.java",
                    "line-number" => 146,
                    "class-name" => "org.jgroups.protocols.BasicTCP",
                    "method-name" => "sendUnicast",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1638,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "sendToSingleMember",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1632,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "doSend",
                    "native-method" => false
                },
                {
                    "file-name" => "NoBundler.java",
                    "line-number" => 38,
                    "class-name" => "org.jgroups.protocols.NoBundler",
                    "method-name" => "sendSingleMessage",
                    "native-method" => false
                },
                {
                    "file-name" => "NoBundler.java",
                    "line-number" => 30,
                    "class-name" => "org.jgroups.protocols.NoBundler",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1620,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "send",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1353,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "_send",
                    "native-method" => false
                },
                {
                    "file-name" => "TP.java",
                    "line-number" => 1262,
                    "class-name" => "org.jgroups.protocols.TP",
                    "method-name" => "down",
                    "native-method" => false
                },
                {
                    "file-name" => "DNS_PING.java",
                    "line-number" => 189,
                    "class-name" => "org.jgroups.protocols.dns.DNS_PING",
                    "method-name" => "sendDiscoveryRequest",
                    "native-method" => false
                },
                {
                    "file-name" => "DNS_PING.java",
                    "line-number" => 182,
                    "class-name" => "org.jgroups.protocols.dns.DNS_PING",
                    "method-name" => "findMembers",
                    "native-method" => false
                },
                {
                    "file-name" => "Discovery.java",
                    "line-number" => 217,
                    "class-name" => "org.jgroups.protocols.Discovery",
                    "method-name" => "invokeFindMembers",
                    "native-method" => false
                },
                {
                    "file-name" => "Discovery.java",
                    "line-number" => 228,
                    "class-name" => "org.jgroups.protocols.Discovery",
                    "method-name" => "lambda$findMembers$0",
                    "native-method" => false
                },
                {
                    "file-name" => undefined,
                    "line-number" => -1,
                    "class-name" => "org.jgroups.protocols.Discovery$$Lambda$968/0x0000000840b0bc40",
                    "method-name" => "run",
                    "native-method" => false
                },
                {
                    "file-name" => "TimeScheduler3.java",
                    "line-number" => 324,
                    "class-name" => "org.jgroups.util.TimeScheduler3$Task",
                    "method-name" => "run",
                    "native-method" => false
                },
                {
                    "file-name" => "ContextReferenceExecutor.java",
                    "line-number" => 49,
                    "class-name" => "org.jboss.as.clustering.context.ContextReferenceExecutor",
                    "method-name" => "execute",
                    "native-method" => false
                },
                {
                    "file-name" => "ContextualExecutor.java",
                    "line-number" => 70,
                    "class-name" => "org.jboss.as.clustering.context.ContextualExecutor$1",
                    "method-name" => "run",
                    "native-method" => false
                },
                {
                    "file-name" => "Thread.java",
                    "line-number" => 829,
                    "class-name" => "java.lang.Thread",
                    "method-name" => "run",
                    "native-method" => false
                }
            ],
            "suspended" => false,
            "in-native" => false,
            "locked-monitors" => [],
            "locked-synchronizers" => []
        },

Did anyone experience a similar problem?

0

There are 0 best solutions below