I need to process spark Broadcast variables using Java RDD API. This is my code what I have tried so far:
This is only sample code to check whether its works or not? In my case I need to work on two csv
files.
SparkConf conf = new SparkConf().setAppName("BroadcastVariable").setMaster("local");
JavaSparkContext ctx = new JavaSparkContext(conf);
Map<Integer,String> map = new HashMap<Integer,String>();
map.put(1, "aa");
map.put(2, "bb");
map.put(9, "ccc");
Broadcast<Map<Integer, String>> broadcastVar = ctx.broadcast(map);
List<Integer> list = new ArrayList<Integer>();
list.add(1);
list.add(2);
list.add(9);
JavaRDD<Integer> listrdd = ctx.parallelize(list);
JavaRDD<Object> mapr = listrdd.map(x -> broadcastVar.value());
System.out.println(mapr.collect());
and it prints output like this:
[{1=aa, 2=bb, 9=ccc}, {1=aa, 2=bb, 9=ccc}, {1=aa, 2=bb, 9=ccc}]
and my requirement is :
[{aa, bb, ccc}]
Is it possible to do like in my required way?
I used
JavaRDD<Object> mapr = listrdd.map(x -> broadcastVar.value().get(x));
insted ofJavaRDD<Object> mapr = listrdd.map(x -> broadcastVar.value());
.Its working now.