I am trying to read the data from HBase using Apache Spark. I want to only scan one specific column. I am creating an RDD of my HBase data like below
SparkConf sparkConf = new SparkConf().setAppName("HBaseRead").setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "localhost:2181");
String tableName = "myTable";
conf.set(TableInputFormat.INPUT_TABLE, tableName);
conf.set(TableInputFormat.SCAN_COLUMN_FAMILY, "myCol");
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);
Here is where I want to convert the JavaPairRDD
to JavaRDD
of string.
JavaRDD<String> rdd = ...
How can I achieve this?
You can get
JavaRDD<String>
usingmap
function like below.