In my spark job, I tried to overwrite a table in each microbatch of structured streaming

batchDF.write.mode(SaveMode.Overwrite).saveAsTable("mytable")

It generated the following error.

  Can not create the managed table('`mytable`'). The associated location('file:/home/ec2-user/environment/spark/spark-local/spark-warehouse/mytable') already exists.;

I knew in Spark 2.xx, the way to solve this issue is to add the following option.

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

It works well in spark 2.xx. However, this option was removed in Spark 3.0.0. Then, how should we solve this issue in Spark 3.0.0?

Thanks!

1

There are 1 best solutions below

0
On

It looks like you run your test data generation and your actual test in the same process - can you just replace these with createOrReplaceTempView to save them to Spark's in-memory catalog instead of into a Hive catalog?

Something like : batchDF.createOrReplaceTempView("mytable")