metastore in Google Dataproc Serverless

57 Views Asked by Piyush Shrivastava At 03 October 2023 at 08:32

I am trying to run a Spark job using Google Cloud Dataproc Serverless. This jobs runs fine when I run it using a normal dataproc Spark cluster. It uses a Hive metastore stored in a mysql db. When I run the job using Dataproc batches, I get the below error:

Caused by: java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://metastore.example.com/metastore, username = test123. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: No suitable driver found for jdbc:mysql://metastore.example.com/metastore

I have tried including the mysql connector jar in my pom and adding the below config

spark.hadoop.javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver

Both of these didn't work. Kindly help.

Original Q&A

There are 2 best solutions below

Igor Dvorzhak On 30 October 2023 at 19:02

Dataproc Serverless Spark does not come with pre-installed MySQL driver - you need to include MySQL driver as a dependency to the Dataproc Serverless Spark Batch if your job needs it.

Piyush Shrivastava On 10 November 2023 at 07:08

Turned out that I needed to add this jar in the Docker image used to run the job. I created a custom image, uploaded it to artifact registry and passed it in my spark submit command to make it work.

No suitable driver found for jdbc:mysql://metastore.example.com/metastore in Google Dataproc Serverless

There are 2 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-CLOUD-DATAPROC

Related Questions in GOOGLE-CLOUD-DATAPROC-SERVERLESS

Trending Questions

Popular # Hahtags

Popular Questions