Trying to use google-cloud-dataproc-serveless with spark.jars.repositories option

gcloud beta dataproc batches submit pyspark sample.py --project=$GCP_PROJECT --region=$MY_REGION --properties \
spark.jars.repositories='https://my.repo.com:443/artifactory/my-maven-prod-group',\
spark.jars.packages='com.spark.mypackage:my-module-jar',spark.dataproc.driverEnv.javax.net.ssl.trustStore=.,\
spark.driver.extraJavaOptions='-Djavax.net.ssl.trustStore=. -Djavax.net.debug=true' \
--files=my-ca-bundle.crt

giving this exception

 javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException

Tried to set this property javax.net.ssl.trustStore using spark.dataproc.driverEnv/spark.driver.extraJavaOptions, but its not working.

Is it possible to fix this issue by setting the right config properties and values, or Custom Image is the ONLY solution, with pre installed certificates?

1

There are 1 best solutions below

0
On BEST ANSWER

You need to have a Java trust store with your cert imported. Then submit the batch with

--files=my-trust-store.jks \
--properties spark.driver.extraJavaOptions='-Djavax.net.ssl.trustStore=./my-trust-store.jks',spark.executor.extraJavaOptions='-Djavax.net.ssl.trustStore=./my-trust-store.jks'