What is the difference between publishOn and parallel? As far as I understand publishOn influences the further processing of the operators chain by executing it using couple of threads. But parallel seems to work exactly the same as it. By default it creates a thread pool of size == #cores and processes the operator's chain the same. Or am I missing some subtlety?
The Schedulers section states:
[publishOn] takes signals from upstream and replays them downstream while executing the callback on a worker from the associated Scheduler
and,
a fixed pool of workers that is tuned for parallel work (Schedulers.parallel()). It creates as many workers as you have CPU cores.
Therefore publishOn(Schedulers.parallel() creates as many threads are cores and executes the subsequent operators in the chain with these workers.
But then the Parralel section states:
you can use the parallel() operator on any Flux. By itself, this method does not parallelize the work. Rather, it divides the workload into "rails" (by default, as many rails as there are CPU cores).
In order to tell the resulting ParallelFlux where to execute each rail (and, by extension, to execute rails in parallel) you have to use runOn(Scheduler). Note that there is a recommended dedicated Scheduler for parallel work: Schedulers.parallel().
Two examples below:
Flux.range(1, 10)
.parallel(2)
.runOn(Schedulers.parallel())
.subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));
and
Flux.range(1, 10)
.publishOn(Schedulers.parallel())
.subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));
Are the above two examples equivalent assuming the computer has two cores?
I think I have the answer to this.
After some testing I found out that
publishOn(Schedulers.parallel()will process yourFluxon separate threads but in sequence.But,
using
parallel().runOn(Schedulers.parallel())will split yourFluxinto amount of cores you have, and each of these newFluxeswill be processed in parallel. Therefore the data in each of the newFluxeswill probably be processed in sequence but theFluxestogether will be processes in parallel.