What is the difference between publishOn
and parallel
? As far as I understand publishOn
influences the further processing of the operators chain by executing it using couple of threads. But parallel
seems to work exactly the same as it. By default it creates a thread pool of size == #cores and processes the operator's chain the same. Or am I missing some subtlety?
The Schedulers section states:
[publishOn] takes signals from upstream and replays them downstream while executing the callback on a worker from the associated Scheduler
and,
a fixed pool of workers that is tuned for parallel work (Schedulers.parallel()). It creates as many workers as you have CPU cores.
Therefore publishOn(Schedulers.parallel()
creates as many threads are cores and executes the subsequent operators
in the chain with these workers.
But then the Parralel section states:
you can use the parallel() operator on any Flux. By itself, this method does not parallelize the work. Rather, it divides the workload into "rails" (by default, as many rails as there are CPU cores).
In order to tell the resulting ParallelFlux where to execute each rail (and, by extension, to execute rails in parallel) you have to use runOn(Scheduler). Note that there is a recommended dedicated Scheduler for parallel work: Schedulers.parallel().
Two examples below:
Flux.range(1, 10)
.parallel(2)
.runOn(Schedulers.parallel())
.subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));
and
Flux.range(1, 10)
.publishOn(Schedulers.parallel())
.subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));
Are the above two examples equivalent assuming the computer has two cores?
I think I have the answer to this.
After some testing I found out that
publishOn(Schedulers.parallel()
will process yourFlux
on separate threads but in sequence.But,
using
parallel().runOn(Schedulers.parallel())
will split yourFlux
into amount of cores you have, and each of these newFluxes
will be processed in parallel. Therefore the data in each of the newFluxes
will probably be processed in sequence but theFluxes
together will be processes in parallel.