I've noticed that the spliterator produced by using Guava's Iterables.partition(collection, partitionSize).spliterator() behaves strange.
Executing trySplit() on the resultant spliterator doesn't split, but executing trySplit() on the result of the initial trySplit() finally does.
Furthermore, using StreamSupport.stream(Iterables.partition(collection, partitionSize).spliterator(), true) does not parallelize the the stream, but
StreamSupport.stream(Iterables.partition(collection, partitionSize).spliterator().trySplit(), true) does parallelize and the resultant stream contains all of the partitions.
My goal is: given a collection with size 100k I want to partition it into batches of size 5000 and process those batches in parallel.
2 questions: does the spliterator generated by Iterables.partition behave correctly? Is my approach a good way to achieve my goal?
The problem here is that
Spliteratorcomes from anIterable, that does not have a known size. So the implementation internally will buffer the elements into a buffer of size1024and continue to increase the buffer on next iterations. What I mean by that is :which would print:
If you want to process
5000elements at a time, you need to start with aSpliteratorthat has a known size to begin with. You could put those partitions to anArrayListfirst:On my machine it shows that they are processed by one thread each: