How to solve the JVM hung issue in Apache Camel when there is a connection issue to IBM MQ?

1.1k Views Asked by At

We are seeing the frequent JVM hung in Apache camel based application whenever there is a network/connection issue to the IBM MQ.

The logger clearly says the connection problem happened and Spring CachingConnectionFactory is trying to reset the underlying MQ connection. Looks like there is an hand-off issue between Spring and IBM MQ Lib when resetting the connection.

Jul 13, 2018 8:51:48 PM org.springframework.jms.connection.CachingConnectionFactory onException
WARNING: Encountered a JMSException - resetting the underlying JMS Connection
com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ1107: A problem with this connection has occurred.
An error has occurred with the IBM MQ JMS connection.
Caused by: com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN').
    at com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:203)
    ... 220 more
Caused by: com.ibm.mq.jmqi.JmqiException: CC=2;RC=2009

At exactly the same timestamp, the JVM gets hung and DMLC is no longer processing the messages. But I do see the 20 listeners count on the consumer queue.

I took a thread dump of the process and I see hung/blocked threads which causes the JVM to be hung.

enter image description here

Here is the JMSCCThreadPoolWorker stacktrace which is waiting because of blocked thread.

JMSCCThreadPoolWorker-727742 
Stack Trace is: 
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006d75964c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at com.ibm.msg.client.jms.internal.JmsSessionImpl$ReentrantDoubleLock.getExclusiveLock(JmsSessionImpl.java:4931)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2521)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2498)
at com.ibm.msg.client.jms.internal.JmsConnectionImpl.stop(JmsConnectionImpl.java:1263)
- locked <0x00000006ca146118> (a com.ibm.msg.client.jms.internal.State)
at com.ibm.mq.jms.MQConnection.stop(MQConnection.java:473)
at org.springframework.jms.connection.SingleConnectionFactory.closeConnection(SingleConnectionFactory.java:452)
at org.springframework.jms.connection.SingleConnectionFactory.resetConnection(SingleConnectionFactory.java:345)
- locked <0x00000006cfba30c8> (a java.lang.Object)
at org.springframework.jms.connection.CachingConnectionFactory.resetConnection(CachingConnectionFactory.java:207)
at org.springframework.jms.connection.SingleConnectionFactory.onException(SingleConnectionFactory.java:323)
at org.springframework.jms.connection.SingleConnectionFactory$AggregatedExceptionListener.onException(SingleConnectionFactory.java:673)
- locked <0x00000006cfba30c8> (a java.lang.Object)
at com.ibm.msg.client.jms.internal.JmsProviderExceptionListener.run(JmsProviderExceptionListener.java:413)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:319)
at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:99)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:343)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:312)
at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1227)
Locked ownable synchronizers:
- <0x00000006ca05eb40> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

Here is the stacktrace of blocked thread:

Stack Trace is: 
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.jms.connection.SingleConnectionFactory.getConnection(SingleConnectionFactory.java:281)
- waiting to lock <0x00000006cfba30c8> (a java.lang.Object)
at org.springframework.jms.connection.SingleConnectionFactory.createConnection(SingleConnectionFactory.java:224)
at org.springframework.jms.connection.JmsTransactionManager.createConnection(JmsTransactionManager.java:288)
at org.springframework.jms.connection.JmsTransactionManager.doBegin(JmsTransactionManager.java:186)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.handleExistingTransaction(AbstractPlatformTransactionManager.java:429)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:349)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:120)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.FatalFallbackErrorHandler.process(FatalFallbackErrorHandler.java:81)
at org.apache.camel.processor.RedeliveryErrorHandler.deliverToFailureProcessor(RedeliveryErrorHandler.java:1057)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler$1.doInTransactionWithoutResult(TransactionErrorHandler.java:183)
at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:34)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:542)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler$1.doInTransactionWithoutResult(TransactionErrorHandler.java:183)
at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:34)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:114)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:120)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:112)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:721)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:681)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:651)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:317)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:235)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1166)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1060)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000006d0b19600> (a java.util.concurrent.ThreadPoolExecutor$Worker)
1

There are 1 best solutions below

2
On BEST ANSWER

You need to figure out which thread owns ReentrantLock < 0x00000006d75964c0>; it won't show in the stack trace.

It's probably a deadlock in the IBM client - either a thread has failed to unlock it or they have a lock order problem.

You might look for a thread that is blocked trying to synchronize on

- waiting to lock <0x00000006ca146118> (a com.ibm.msg.client.jms.internal.State)

It could be that that thread has already acquired the ReentrantLock. If no such thread exists then it's likely the former case (failure to unlock).

In any case, the deadlock appears to be in the IBM code. Tough to debug since (last time I looked) the MQ client is closed-source. You might need to open a ticket with IBM; assuming you have support there.