Need to fetch all the versions of HBase table using the connector "org.apache.hadoop.hbase.spark". This only returns the latest version always. Need a way to fetch all versions while reading it into Spark dataframe
Cloudera distribution: 7.1.8
Hbase Version: 2.4
Spark version: 3.3
Configured the Hbase Spark connector by following the cloudera blogs:
HBase Table:
scan 'test', {VERSIONS=\>3}
ROW COLUMN+CELL
row1 column=cf:data, timestamp=1970-01-01T05:30:00.003, value=value3
row1 column=cf:data, timestamp=1970-01-01T05:30:00.002, value=value2
row1 column=cf:data, timestamp=1970-01-01T05:30:00.001, value=value1
There's one row(row1) with 3 version for data("value1","value2","value3").
Below spark code to fetch the "test" table
Spark code:
val hbase_column_mapping = "device String :key, data STRING cf:data"
val hbase_table = "test"
val df = spark.read.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping", hbase_column_mapping)
.option("hbase.table", hbase_table)
.option("hbase.spark.query.maxVersions", 3)
.option("hbase.spark.use.hbasecontext", false)
.load()
df.show()
Output is:
+------+------+
| data|device|
+------+------+
|value3| row1|
+------+------+
Even after providing the conf "hbase.spark.query.maxVersions", still it fetches only the latest version. Requirement is to fetch all versions from Hbase table in Spark. In the above case it should fetch 3 records for 3 versions.
Expected:
+------+------+
| data|device|
+------+------+
|value1| row1|
|value2| row1|
|value3| row1|
+------+------+