Default Partitioner of JavaSparkContext

19 Views Asked by At

I am running the code below in IntelliJ with Spark 2.4.2:

try (JavaSparkContext jsc = new JavaSparkContext(new SparkConf().setAppName("Dummy").setMaster("local"))) {
            JavaPairRDD<Integer, Integer> integerIntegerJavaPairRDD = jsc.parallelizePairs(Arrays.asList(new Tuple2<>(1, 2), new Tuple2<>(3,4)), 2);
            integerIntegerJavaPairRDD.collect();
            System.out.println(integerIntegerJavaPairRDD.partitioner().get());
        }

A NullPointerException is thrown:

Exception in thread "main" java.lang.NullPointerException
    at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
    at org.apache.spark.api.java.Optional.get(Optional.java:115)
    at Dummy.main(Dummy.java:44)

This exception is thrown at the line System.out.println(integerIntegerJavaPairRDD.partitioner().get());.

I have been reading through documents & it seems like the default partitioner in case of a Pair RDD is HashPartitioner. So I am expecting that the above code would give me some output.

Null Pointer indicates that there's no partitioner set.

This is the same behaviour in the spark-shell. From this behaviour I am inferring the following: When not set explicitly, the default partitioner is null.

Is my understanding correct?

0

There are 0 best solutions below