Spark - Fetch Hbase table all versions data using HBase Spark connector

40 Views Asked by vishwas y At 19 February 2024 at 07:16

Need to fetch all the versions of HBase table using the connector "org.apache.hadoop.hbase.spark". This only returns the latest version always. Need a way to fetch all versions while reading it into Spark dataframe

Cloudera distribution: 7.1.8

Hbase Version: 2.4

Spark version: 3.3

Configured the Hbase Spark connector by following the cloudera blogs:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/accessing-hbase/topics/hbase-configure-spark-connector.html

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/accessing-hbase/topics/hbase-example-using-hbase-spark-connector.html

HBase Table:

scan 'test', {VERSIONS=\>3}
ROW                                        COLUMN+CELL
row1                                      column=cf:data, timestamp=1970-01-01T05:30:00.003, value=value3
row1                                      column=cf:data, timestamp=1970-01-01T05:30:00.002, value=value2
row1                                      column=cf:data, timestamp=1970-01-01T05:30:00.001, value=value1

There's one row(row1) with 3 version for data("value1","value2","value3").

Below spark code to fetch the "test" table

Spark code:

val hbase_column_mapping = "device String :key, data STRING cf:data"
val hbase_table = "test"

val df = spark.read.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping", hbase_column_mapping)
.option("hbase.table", hbase_table)
.option("hbase.spark.query.maxVersions", 3)
.option("hbase.spark.use.hbasecontext", false)
.load()

df.show()

Output is:
+------+------+
|  data|device|
+------+------+
|value3|  row1|
+------+------+

Even after providing the conf "hbase.spark.query.maxVersions", still it fetches only the latest version. Requirement is to fetch all versions from Hbase table in Spark. In the above case it should fetch 3 records for 3 versions.

Expected:
+------+------+
|  data|device|
+------+------+
|value1|  row1|
|value2|  row1|
|value3|  row1|
+------+------+

Original Q&A

Spark - Fetch Hbase table all versions data using HBase Spark connector

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in HBASE

Trending Questions

Popular # Hahtags

Popular Questions