estimateSize() on sequential Spliterator

907 Views Asked by At

I'm implementing a Spliterator that explicitly restricts parallelization by having trySplit() return null. Would implementing estimateSize() offer any performance improvements for a stream produced by this spliterator? Or is the estimated size only useful for parallelization?

EDIT: To clarify, I'm specifically asking about an estimated size. In other words, my spliterator does not have the SIZED characteristic.

2

There are 2 best solutions below

6
On BEST ANSWER

Looking at the call hierarchy to the relevant spliterator characteristic reveals that it's at least relevant for stream.toArray() performance

enter image description here

Additionally there is an equivalent flag in the internal stream implementation that seems to be used for sorting:

enter image description here

So aside from parallel stream operations the size estimate seems to be used for those two operations.

I don't claim exhaustiveness for my search, so just take these as examples.


Without the SIZED characteristic I can only find calls to estimateSize() that are relevant to parallel execution of the stream pipeline.

Of course this might change in the future or another Stream implementation than the standard JDK one could act differently.

1
On

A spliterator may traverse elements:

1.Individually(tryAdvance())

2.Sequentially in bulk(forEachRemaining())

As per java docs estimateSize() comes handy during splitting.

Spliterators can provide an estimate of the number of remaining elements via the estimateSize() method. Ideally, as reflected in characteristic SIZED, this value corresponds exactly to the number of elements that would be encountered in a successful traversal. However, even when not exactly known, an estimated value value may still be useful to operations being performed on the source, such as helping to determine whether it is preferable to split further or traverse the remaining elements sequentially.

Since your spliterator does not have the SIZED characteristic estimateSize will not offer any performance(because of no parallelism ), However keep in mind that Java-docs of estimateSize doesn't mention anything of parallelism ,all it states is:

Returns: the estimated size, or Long.MAX_VALUE if infinite, unknown, or too expensive to compute.