Java Parallel Stream Slower than Serial

143 Views Asked by At

I have a database record of around 1000000 paragraphs with around ~500 characters each. By reading all the records, I need to get the list of alphabet ordered by most to least used.

I mock the database reading by creating stream up to 1000000 then process the stream in parallel

final Map<Character, Long> charCountMap = new ConcurrentHashMap<>();
for (char c = 'a'; c <= 'z'; c++) {
    charCountMap.put(c, 0l);
}

System.out.println("Parallel Stream");
long start = System.currentTimeMillis();
Stream.iterate(0, i -> i).limit(1000000).parallel() //mock database stream
    .forEach(i-> RandomStringUtils.randomAlphanumeric(500)
    .toLowerCase().chars().mapToObj(c -> Character.valueOf((char) c)).filter(c -> c >= 97 && c <= 122)
    .forEach(c -> charCountMap.compute(c, (k, v) -> v + 1))); //update ConcurrentHashMap

long end = System.currentTimeMillis();
System.out.println("Parallel Stream time spent :" + (end - start));

System.out.println("Serial Stream"); start = System.currentTimeMillis();
Stream.iterate(0, i -> i).limit(1000000) //mock database stream
    .forEach(i-> RandomStringUtils.randomAlphanumeric(500)
    .toLowerCase().chars().mapToObj(c -> Character.valueOf((char) c)).filter(c -> c >= 97 && c <= 122)
    .forEach(c -> charCountMap.compute(c, (k, v) -> v + 1)));
end = System.currentTimeMillis();
System.out.println("Serial Stream time spent :" + (end - start));

I initially thought that parallel stream would be faster even with expected overhead for stream larger than 100,000. However, test shows that serial stream is ~5X faster than parallel even for 1,000,000 records.

I suspected it was because of updating the ConcurrentHashMap. But when I removed it and change with empty function, there is still significant performance gap.

Is there something wrong in my database mock up call or the way I use parallel stream?

1

There are 1 best solutions below

0
On

Using RandomStringUtils.randomAlphanumeric(500) isn't suitable for use with parallel() as according to the code here it uses a static variable for the random string generation. Therefore all the calls from all the threads to generate a random string will have contention on the same underlying instance of Random:

private static final Random RANDOM = new Random();

Write your own random string generator which uses single instance of Random per thread or has use of java.util.concurrent.ThreadLocalRandom - this avoids contention on the random sequences. The same issue causes poor performance in this question before it was edited to use ThreadLocalRandom.

See javadoc for java.util.Random says:

Instances of java.util.Random  are threadsafe.
However, the concurrent use of the same java.util.Random 
instance across threads may encounter contention and consequent
poor performance. Consider instead using 
java.util.concurrent.ThreadLocalRandom in multithreaded
designs.