In Apache Spark, partitioners are used to define how data is going to be shuffled. They all have a getPartition(key: Any): Int
method to do this.
In particular in RangePartitioner
, developers need to transfer information about the RDD
to start it. So I am confused about where do partitioners actually perform their work: the executors, the driver or the master?