Java 21 virtual thread executor performing worse than executor with pooled OS threads?

484 Views Asked by At

I have just upgraded our Spring Boot applications to Java 21. As a part of that, I have also done changes to use virtual threads. Both when serving API requests and when doing async operations internally using executors.

For one use-case, it seems like an Executor powered by virtual threads is performing worse than a ForkJoinPool powered by OS threads. This use-case is setting some MDC values and calling an external system through HTTP.

This is my pseudo-ish-code:

List<...> ... = executorService.submit(
                () -> IntStream.rangeClosed(-from, to)
                        .mapToObj(i -> ...)
                        .parallel()
                        .map(... -> {
                            try {
                                service.setSomeThreadLocalString(...);
                                MDC.put(..., ...);
                                MDC.put(..., ...);

                                return service.call(...);
                            } finally {
                                service.removeSomeThreadLocalString(...);
                                MDC.remove(...);
                                MDC.remove(...);
                            }
                        })
                        .toList())
        .get();

Where ExecutorService is either:

  1. new ForkJoinPool(30)
  2. Executors.newVirtualThreadPerTaskExecutor()

It looks like option 1 is performing a lot better than 2. Sometimes it is 100% faster than option 1. I have done this test in a Java 21 environment. I am testing with 10 parallel executions. Where option 1 takes 800-1000ms normally, option 2 takes 1500-2000 ms.

If it makes any difference, have this property enabled in Spring Boot:

spring:
  threads:
    virtual:
      enabled: true

Any ideas why this is happening?

2

There are 2 best solutions below

0
Holger On BEST ANSWER

You are assuming that submitting a parallel stream operation as a job to another executor service will make the Stream implementation use that executor service. This is not the case.

There is an undocumented trick to make a parallel stream operation use a different Fork/Join pool by initiating it from a worker thread of that pool. But the executor service producing virtual threads is not a Fork/Join pool.

So when you initiate the parallel stream operation from a virtual thread, the parallel stream will use the common pool for the operation. In other words, you are still using platform threads except for the one initiating virtual thread, as the Stream implementation also performs work in the caller thread.

So when I use the following program

public class ParallelStreamInsideVirtualThread {
    public static void main(String[] args) throws Exception {
        var executorService = Executors.newVirtualThreadPerTaskExecutor();
        var job = executorService.submit(
            () -> {
              Thread init = Thread.currentThread();
              return IntStream.rangeClosed(0, 10).parallel()
                 .peek(x -> printThread(init))
                 .mapToObj(String::valueOf)
                 .toList();
            });
        job.get();
    }
  
    static void printThread(Thread initial) {
        Thread t = Thread.currentThread();
        System.out.println((t.isVirtual()? "Virtual  ": "Platform ")
            + (t == initial? "(initiator)": t.getName()));
    }
}

it will print something like

Virtual  (initiator)
Virtual  (initiator)
Platform ForkJoinPool.commonPool-worker-1
Platform ForkJoinPool.commonPool-worker-3
Platform ForkJoinPool.commonPool-worker-2
Platform ForkJoinPool.commonPool-worker-4
Virtual  (initiator)
Platform ForkJoinPool.commonPool-worker-1
Platform ForkJoinPool.commonPool-worker-3
Platform ForkJoinPool.commonPool-worker-5
Platform ForkJoinPool.commonPool-worker-2

In short, you are not measuring the performance of virtual threads at all.

2
Mark Rotteveel On

As Holger indicates in the comment below and in their answer, the actual problem is that parallel streams are not actually run on virtual threads, but on the common pool. So though CPU count (and -XX:ActiveProcessorCount) will influence the performance by virtue of also configuring the parallelism of the common pool, setting jdk.virtualThreadScheduler.parallelism is unlikely to influence much here.

I'll leave my answer for reference, but know that for this specific case, it doesn't fully apply.


As you indicate in the comments that you're using a Kubernetes container configured with 1 CPU, the problem is that effectively you're comparing a fork-join pool with a parallelism of 30 with virtual thread execution with a parallelism of 1.

By default, Java allocates an OS thread pool (a work-stealing fork-join pool) with parallelism equal to the processor count for virtual thread execution (see also JEP 444: Virtual Threads). If you're running on Kubernetes, and the pod has 1 CPU (or less than 1 CPU), Java will assume a processor count of 1.

You can tell Java to assume a different number of processors by passing the -XX:ActiveProcessorCount=n setting, with n the number of processors Java should assume. Alternatively, or additionally, you can set the system property jdk.virtualThreadScheduler.parallelism (using -Djdk.virtualThreadScheduler.parallelism=n where n is the desired parallelism). Both of these need to be passed to the java executable on the command line.

Note that you probably shouldn't set it too high compared to your CPU quota, otherwise you'll likely just introduce reasons for Kubernetes to throttle your pod, but reasonable values will depend on the actual load behaviour of your application.