.format("org.apache.phoenix.spark") vs .format("jdbc")

48 Views Asked by At

I wonder what is the difference of using .format("org.apache.phoenix.spark") vs .format("jdbc") when loading HBase table (through Phoenix) to spark dataframe.

val tracesDF = spark.sqlContext.read
  .format("org.apache.phoenix.spark")
  .option("table", hbaseTblName)
  .option("zkUrl", appConf.getString("zookeeper_url"))

vs

val tracesDF = spark.sqlContext.read
  .format("jdbc")
  .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
  .option("url", appConf.getString("hbasedb_url"))

Another issue I found which related to this issue:

  • I create the HBase table through jdbc statement hbaseCon.createStatement().execute('CREATE TABLE ...)
  • The dataframe of .format("org.apache.phoenix.spark") is empty, while .format("jdbc") return the data properly
  • Need to specify column family [tracesDF.select(...,"``B.SAMPLES_BINARY``")] when using .format("org.apache.phoenix.spark") but not when using .format("jdbc") [tracesDF.select(...,"SAMPLES_BINARY")]
0

There are 0 best solutions below