Spark: passing broadcast variable to executors

2.2k Views Asked by At

I am passing a broadcast variable to all my executors using the following code. The code seems to work, but I don't know if my approach is good enough. Just want to see if anyone has any better suggestions. Thank you very much!

val myRddMap = sc.textFile("input.txt").map(t => myParser.parse(t))
val myHashMapBroadcastVar = sparkContext.broadcast(myRddMap.collect().toMap)

where myRddMap is of type org.apache.spark.rdd.RDD[(String, (String, String))]

Then I have a utility function which I pass in RDDs and variables like:

val myOutput = myUtiltityFunction.process(myRDD1, myHashMapBroadcastVar)

So is above code a good way for handling broadcast variables? Or is there any better approach? Thanks!

1

There are 1 best solutions below

0
On

Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

Broadcast variables are actually sent to all nodes. So it doesn't matter that you use those in a utility function, or anywhere. As for as I think you are doing the right thing, nothing seems wrong that resulted in a poor performance.