I have a Hive Parquet table which I am creating using Spark 2.3 API df.saveAstable
. There is a separate Hive process that alters the same parquet table to add columns (based on requirements).
However, next time when I try to read the same parquet table into Spark dataframe, the new column which was added to the parquet table using Hive Alter Table command is not showing up in the df.printSchema
output.
Based on initial analysis, it seems that there might be some conflict, and Spark is using its own schema instead of reading the Hive metastore. Hence, I tried the below options :
Changing the spark setting:
spark.sql.hive.convertMetastoreParquet=false
and Refreshing the spark catalog:
spark.catalog.refreshTable("table_name")
However, the above two options are not solving the problem.
Any suggestions or alternatives would be super helpful.
To fix this solution, you have to use the same alter command used in hive to spark-shell as well.