How do I save spark.writeStream results in hive?

939 Views Asked by At

I am using spark.readStream to read data from Kafka and running an explode on the resulting dataframe. I am trying to save the result of the explode in a Hive table and I am not able to find any solution for that. I tried the following method but it doesn't work (it runs but I don't see any new partitions created)

val query = tradelines.writeStream.outputMode("append")
  .format("memory")
  .option("truncate", "false")
  .option("checkpointLocation", checkpointLocation)
  .queryName("tl")
  .start() 

sc.sql("set hive.exec.dynamic.partition.mode=nonstrict;")

sc.sql("INSERT INTO TABLE default.tradelines PARTITION (dt) SELECT * FROM tl")
1

There are 1 best solutions below

0
On

Check HDFS for the dt partitions on the file system

You need to run MSCK REPAIR TABLE on the hive table to see new partitions.

If you aren't doing anything special with Spark, then it's worth pointing out that Kafka Connect HDFS is capable of registering Hive partitions directly from Kafka.