Error connecting to jdbc with pyspark in dataproc

46 Views Asked by At

I am trying to connect to a mssql server via jdbc with pyspark in dataproc.

I am getting an error py4j.protocol.Py4JJavaError: An error occurred while calling o79.jdbc. : java.lang.ClassNotFoundException: mssql-jdbc-12.4.0.jre11.jar

The main file (main.py):

spark = SparkSession.builder.appName('my_app').getOrCreate()
connection_string = f'jdbc:sqlserver://1.2.3.4:1433;databaseName=my_db;'
properties = { 'user':'my_user', 'password':'my_password' }
df = spark.read.jdbc(
    url=connection_string,
    table='my_table',
    properties=properties
)

The gcloud command:

gcloud dataproc batches submit pyspark \
--batch my_batch main.py  \
--jars mssql-jdbc-12.4.0.jre11.jar \
--properties driver=mssql-jdbc-12.4.0.jre11.jar
0

There are 0 best solutions below