Access Spark Hive metastore within an UDF running in the workers (Databricks)

272 Views Asked by Lucas Mendes Mota Da Fonseca At 28 July 2025 at 01:10

Context

I have an operation that should be performed on some tables using pyspark. This operation includes accessing the Spark metastore (in Databricks) to get some metadata. Since I have plenty of tables I'm parallelizing this operation among the cluster workers with an RDD, as you can see in the code below:

    base_spark_context = SparkContext.getOrCreate()
    rdd = base_spark_context.sc.parallelize(tables_list)
    rdd.map(lambda table_name: sync_table(table_name)).collect()

The UDF sync_table() run queries on the metastore, similar to this code line:

spark_client.session.sql("select 1")

Problem The problem is that this SQL execution not succeeds. Rather I get some metastore related error. Traceback:

py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

(suppressed lines)

Caused by: java.lang.reflect.InvocationTargetException

(suppressed lines)

Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader sun.misc.Launcher$AppClassLoader@16c0663d, see the next exception for details.

(suppressed lines)

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /databricks/spark/work/app-20210413201900-0000/0/metastore_db.

Is there any limitation accessing the Databricks metastore within a worker, after parallelizing the operation in such a way? Or there is a possibility of performing such an operation?

Original Q&A

Access Spark Hive metastore within an UDF running in the workers (Databricks)

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in HIVE

Related Questions in DATABRICKS

Related Questions in METASTORE

Trending Questions

Popular # Hahtags

Popular Questions