I am running the code below in IntelliJ with Spark 2.4.2:
try (JavaSparkContext jsc = new JavaSparkContext(new SparkConf().setAppName("Dummy").setMaster("local"))) {
JavaPairRDD<Integer, Integer> integerIntegerJavaPairRDD = jsc.parallelizePairs(Arrays.asList(new Tuple2<>(1, 2), new Tuple2<>(3,4)), 2);
integerIntegerJavaPairRDD.collect();
System.out.println(integerIntegerJavaPairRDD.partitioner().get());
}
A NullPointerException is thrown:
Exception in thread "main" java.lang.NullPointerException
at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
at org.apache.spark.api.java.Optional.get(Optional.java:115)
at Dummy.main(Dummy.java:44)
This exception is thrown at the line System.out.println(integerIntegerJavaPairRDD.partitioner().get());.
I have been reading through documents & it seems like the default partitioner in case of a Pair RDD is HashPartitioner. So I am expecting that the above code would give me some output.
Null Pointer indicates that there's no partitioner set.
This is the same behaviour in the spark-shell. From this behaviour I am inferring the following:
When not set explicitly, the default partitioner is null.
Is my understanding correct?