While analyzing the logs of one of the production environments, I saw a thread in "WAITING" state on a countdownlatch await()
...sun.misc.Unsafe.park(Native Method)
...java.util.concurrent.locks.LockSupport.park(Unknown Source)
...java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
...java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source)
...java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source)
...java.util.concurrent.CountDownLatch.await(Unknown Source)
The latch was initialized to 1 and another thread did call countDown() method on the same instance of the latch but still the main thread remains blocked on the latch. This has lead to the jvm being hung indefinitely.
Getting blocked even when the latch count reaches zero sounds unreasonable and I am looking out for suggestions on further troubleshooting this issue.
Any ideas?
Note - The jvm version used is as below
java version "1.5.0_15" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04) Java HotSpot(TM) Client VM (build 1.5.0_15-b04, mixed mode, sharing)
Update - Below is the code snippet of the thread I am talking about above
private class MyRunnable implements Runnable, Thread.UncaughtExceptionHandler {
private AtomicBoolean shouldStop = new AtomicBoolean(false);
private CountDownLatch stopLatch = new CountDownLatch(1);
private Thread currentThread;
public void run() {
Thread.currentThread().setName("My Thread");
Thread.currentThread().setUncaughtExceptionHandler(this);
currentThread = Thread.currentThread();
if (currentThread.isInterrupted()) {
logger.debug("The pool thread had its interrupted stattus set. Clearing...");
Thread.interrupted();
logger.debug("The pool thread had its interrupted stattus set. Clearing...DONE");
}
try {
doBusinessLogic(shouldStop);
} catch (Exception e) {
logger.error("An exception was encountered in the thread", e);
} finally {
if (currentThread.isInterrupted()) {
logger.debug("Clearing interupted status for the thread and returning to pool...");
Thread.interrupted();
}
stopLatch.countDown();
logger.debug("Stopped task after counting down on the latch");
}
}
public void stopThread() {
shouldStop.set(true);
logger.debug("Stop flag was set to true.. waiting for thread method to return...");
try {
stopLatch.await();
logger.debug("Stop flag was set to true... task has finished. Returning.");
} catch (InterruptedException e) {
logger.error("Interrupted while awaiting thread stop event...", e);
}
}
public void uncaughtException(Thread t, Throwable e) {
logger.error("An uncaught exception occurred in the task thread ", e);
}
private void doBusinessLogic(AtomicBoolean shouldStop) {
long sleepPeriod = 11;
while (!shouldStop.get()) {
try {
Thread.sleep(sleepPeriod);
} catch (InterruptedException e) {
logger.debug("Thread was interrupted.Clearing interrupted status and proceeding", e);
if (Thread.currentThread().isInterrupted())
Thread.interrupted();
}
if (shouldStop.get()) {
logger.debug("Stop flag was set. Returning.");
return;
}
try {
logger.debug("Performing business logic...");
//.....
logger.debug("Performing business logic...DONE");
} catch (Throwable e) {
logger.error("An exception occurred", e);
}
if (shouldStop.get()) {
logger.debug("Stop flag was set. Returning.");
return;
}
}
}
}
Here is what I see in the logs
DEBUG [main Thread] - Stop flag was set to true.. waiting for thread method to return...
DEBUG [My Thread] - Stop flag was set. Returning.
DEBUG [My Thread] - Stopped task after counting down on the latch
The logger statement after latch.await() is never printed and the thread dump also indicates that the main thread is blocked on the latch.
I would have liked to write this as a comment but my reputation doesn't allow commenting yet, so I'll offer my small experience here.
I run into the same issue. My countDownLatch was being accessed from synchronized methods: removing the synchronization solved the problem (while of course I had to adjust the methods' code to cope with the absence of synchronization). I've no idea how to explain this behaviour.