I am trying to run a Spark job using Google Cloud Dataproc Serverless. This jobs runs fine when I run it using a normal dataproc Spark cluster. It uses a Hive metastore stored in a mysql db. When I run the job using Dataproc batches, I get the below error:
Caused by: java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://metastore.example.com/metastore, username = test123. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: No suitable driver found for jdbc:mysql://metastore.example.com/metastore
I have tried including the mysql connector jar in my pom and adding the below config
spark.hadoop.javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
Both of these didn't work. Kindly help.
Dataproc Serverless Spark does not come with pre-installed MySQL driver - you need to include MySQL driver as a dependency to the Dataproc Serverless Spark Batch if your job needs it.