I am trying to initialize an array in Scala, using parallelization. However, when using ParSeq.fill
method, the performance doesn't seem to be better any better than sequential initialization (Seq.fill
). If I do the same task, but initializing the collection with map
, then it is much faster.
To show my point, I set up the following example:
import scala.collection.parallel.immutable.ParSeq
import scala.util.Random
object Timer {
def apply[A](f: => A): (A, Long) = {
val s = System.nanoTime
val ret = f
(ret, System.nanoTime - s)
}
}
object ParallelBenchmark extends App {
def randomIsPrime: Boolean = {
val n = Random.nextInt(1000000)
(2 until n).exists(i => n % i == 0)
}
val seqSize = 100000
val (_, timeSeq) = Timer { Seq.fill(seqSize)(randomIsPrime) }
println(f"Time Seq:\t\t $timeSeq")
val (_, timeParFill) = Timer { ParSeq.fill(seqSize)(randomIsPrime) }
println(f"Time Par Fill:\t $timeParFill")
val (_, timeParMap) = Timer { (0 until seqSize).par.map(_ => randomIsPrime) }
println(f"Time Par map:\t $timeParMap")
}
And the result is:
Time Seq: 32389215709
Time Par Fill: 32730035599
Time Par map: 17270448112
Clearly showing that the fill method is not running in parallel.
The parallel collections library in Scala can only parallelize existing collections,
fill
hasn't been implemented yet (and may never be). Your method of using aRange
to generate a cheap placeholder collection is probably your best option if you want to see a speed boost.Here's the underlying method being called by
ParSeq.fill
, obviously not parallel.