Why acquiring a locked lock is slower when using JDK21 compared to JDK11

160 Views Asked by At

While optimizing some locking stuff, I used a JMH benchmark to see how much does locking a locked ReentrantLock costs compared to just locking it once. I was surprised when I saw that jdk11 performed better than jdk21..It would be really nice to understand why and whether my benchmark correct after all.

I also added benchmark with synchronised block and without any locking at all. As expected, synchronised block is optimized and performs almost as the lock-free one and there is no degradation between different jdk versions.

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class LockNoLockBenchmark {
  int counter;

  ReentrantLock lock = new ReentrantLock();


  @Benchmark
  public void noLock() {
    ++counter;
  }

  @Benchmark
  public void syncLock() {
    synchronized (new Object()) {
      ++counter;
    }
  }

  @Benchmark
  public void lockUnlock() {
    lock.lock();
    try {
      ++counter;
    } finally {
      lock.unlock();
    }
  }

  @Benchmark
  public void lockLockUnlockUnlock() {
    lock.lock();
    try {
      lock.lock();
      try {
        ++counter;
      } finally {
        lock.unlock();
      }
    } finally {
      lock.unlock();
    }
  }
}

Run on Intel Rocket Lake (Core i9) 12th Gen Intel(R) Core(TM) i9-12950HX 12 cores 64Gb RAM

  1. JDK 21
openjdk 21.0.2 2024-01-16
OpenJDK Runtime Environment (build 21.0.2+13-58)
OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)

Benchmark                                 Mode  Cnt   Score   Error  Units
LockNoLockBenchmark.lockLockUnlockUnlock  avgt   10  27.457 ± 0.876  ns/op
LockNoLockBenchmark.lockUnlock            avgt   10  11.409 ± 0.256  ns/op
LockNoLockBenchmark.noLock                avgt   10   0.280 ± 0.010  ns/op
LockNoLockBenchmark.syncLock              avgt   10   0.280 ± 0.008  ns/op
  1. JDK 11
openjdk 11.0.21 2023-10-17
OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

Benchmark                                 Mode  Cnt   Score   Error  Units
LockNoLockBenchmark.lockLockUnlockUnlock  avgt   10  22.414 ± 1.366  ns/op
LockNoLockBenchmark.lockUnlock            avgt   10  11.690 ± 0.407  ns/op
LockNoLockBenchmark.noLock                avgt   10   0.283 ± 0.021  ns/op
LockNoLockBenchmark.syncLock              avgt   10   0.289 ± 0.012  ns/op

I'd expect no degradation in performance for this case with JDK21. I am also interested what are some ways to optimize the code when I need to acquire a locked lock. Thank you.

1

There are 1 best solutions below

1
On BEST ANSWER

In JDK 14, there was a massive rewrite of java.util.concurrent internals in the context of JDK-8229442. The goal was to improve overall performance of concurrent primitives and prepare the implementation for virtual threads.

However, as it often happens, improvements in one scenario are accompanied by a regression in another.

In JDK 11, the code for recursive locking looks as follows. It has a fast path for checking if the lock is owned by the current thread. Note that there is no atomic compareAndSet operation on this path.

final boolean nonfairTryAcquire(int acquires) {
    final Thread current = Thread.currentThread();
    int c = getState();
    if (c == 0) {
        if (compareAndSetState(0, acquires)) {
            setExclusiveOwnerThread(current);
            return true;
        }
    }
    else if (current == getExclusiveOwnerThread()) {
        int nextc = c + acquires;
        if (nextc < 0) // overflow
            throw new Error("Maximum lock count exceeded");
        setState(nextc);
        return true;
    }
    return false;
}

In JDK 21, the code looks a bit differently. initialTryLock always executes compareAndSetState, whether the lock is recursive or not, and that's where the performance difference comes from.

final boolean initialTryLock() {
    Thread current = Thread.currentThread();
    if (compareAndSetState(0, 1)) { // first attempt is unguarded
        setExclusiveOwnerThread(current);
        return true;
    } else if (getExclusiveOwnerThread() == current) {
        int c = getState() + 1;
        if (c < 0) // overflow
            throw new Error("Maximum lock count exceeded");
        setState(c);
        return true;
    } else
        return false;
}

The aforementioned refactoring already caused a performance regression earlier, which was later fixed. If your question arose from a real issue in production, you're welcome to submit a bug report.

As a side notice, your syncLock() benchmark does not actually measure performance of synchronized, since locking on a local non-escaped object is a no-op, and JIT compiler happily eliminates unnecessary locking altogether.