How can I save an R dataframe with SparkR::saveAsTable() again under the same name as an already existing table after changing columns?
I am working with R on databricks and saved an R dataframe table_x as table in the database, using this:
data_x <- SparkR::createDataFrame(table_x)
SparkR::saveAsTable(data_x, tableName="table_x", mode = "overwrite")
Later I added columns to the table and changed some column names as well. When I try to save it again, it does not work and the error message says there is a schema mismatch, even if I removed table_x from the database with "drop table".
If I try
data_x <- SparkR::createDataFrame(table_x)
SparkR::saveAsTable(data_x, tableName="table_x", mode = "error")
the error message says: "The associated location (...) is not empty and also not a Delta table."
So even if I dropped table_x, its location is not empty and its table schema is still there?
Below the error message it says:
To overwrite your schema or change partitioning, please set:
'.option("overwriteSchema", "true")'.
How can I do this in SparkR::saveAsTable? The RDocumentation site says that additional options can be passed to the method, but how exactly would I do this?
Very simple, you just add this argument
overwriteSchema = "true"to the functionSparkR::saveAsTable. Check out the documentation for...argument ofSparkR::saveAsTablefunction, which says "additional option(s) passed to the method".If you are not aware of how
...argument works, check Hadley Wickham's book on Advanced R, https://adv-r.hadley.nz/functions.html#fun-dot-dot-dotThe observation here is that, by default, all the Spark dataframes are saved in Delta format instead of Parquet, and Delta format enforces schema consistency as one of many benefits.
Worked out example: