I am using spark.readStream to read data from Kafka and running an explode on the resulting dataframe.
I am trying to save the result of the explode in a Hive table and I am not able to find any solution for that.
I tried the following method but it doesn't work (it runs but I don't see any new partitions created)
val query = tradelines.writeStream.outputMode("append")
.format("memory")
.option("truncate", "false")
.option("checkpointLocation", checkpointLocation)
.queryName("tl")
.start()
sc.sql("set hive.exec.dynamic.partition.mode=nonstrict;")
sc.sql("INSERT INTO TABLE default.tradelines PARTITION (dt) SELECT * FROM tl")
Check HDFS for the
dtpartitions on the file systemYou need to run
MSCK REPAIR TABLEon the hive table to see new partitions.If you aren't doing anything special with Spark, then it's worth pointing out that Kafka Connect HDFS is capable of registering Hive partitions directly from Kafka.