Difference between publishOn and parallel

1.2k Views Asked by At

What is the difference between publishOn and parallel? As far as I understand publishOn influences the further processing of the operators chain by executing it using couple of threads. But parallel seems to work exactly the same as it. By default it creates a thread pool of size == #cores and processes the operator's chain the same. Or am I missing some subtlety?

The Schedulers section states:

[publishOn] takes signals from upstream and replays them downstream while executing the callback on a worker from the associated Scheduler

and,

a fixed pool of workers that is tuned for parallel work (Schedulers.parallel()). It creates as many workers as you have CPU cores.

Therefore publishOn(Schedulers.parallel() creates as many threads are cores and executes the subsequent operators in the chain with these workers.

But then the Parralel section states:

you can use the parallel() operator on any Flux. By itself, this method does not parallelize the work. Rather, it divides the workload into "rails" (by default, as many rails as there are CPU cores).

In order to tell the resulting ParallelFlux where to execute each rail (and, by extension, to execute rails in parallel) you have to use runOn(Scheduler). Note that there is a recommended dedicated Scheduler for parallel work: Schedulers.parallel().

Two examples below:

Flux.range(1, 10)
    .parallel(2)
    .runOn(Schedulers.parallel())
    .subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));

and

Flux.range(1, 10)
    .publishOn(Schedulers.parallel())
    .subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));

Are the above two examples equivalent assuming the computer has two cores?

1

There are 1 best solutions below

0
On

I think I have the answer to this.

After some testing I found out that publishOn(Schedulers.parallel() will process your Flux on separate threads but in sequence.

But,

using parallel().runOn(Schedulers.parallel()) will split your Flux into amount of cores you have, and each of these new Fluxes will be processed in parallel. Therefore the data in each of the new Fluxes will probably be processed in sequence but the Fluxes together will be processes in parallel.