We are trying to connect to a remote Oracle database running as AmazonRDS using SSO wallet configured at our end and Apache Spark. We are able to load the data using the spark-shell
utility as mentioned below
Start the spark shell with jdbc and oraclepki jar added to the classpath
spark-shell --driver-class-path /path/to/ojdbc8.jar:/path/to/oraclepki.jar
This is the JDBC url used:
val JDBCURL="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=www.example.aws.server.com)(PORT=1527))(CONNECT_DATA=(SID=XXX))(SECURITY = (SSL_SERVER_CERT_DN =\"C=US,ST=xxx,L=ZZZ,O=Amazon.com,OU=RDS,CN=www.xxx.aws.zzz.com\")))"
And below is the Spark jdbc call to load the data
spark.read.format("jdbc").option("url",JDBCURL)
.option("user","USER")
.option("oracle.net.tns_admin","/path/to/tnsnames.ora")
.option("oracle.net.wallet_location","(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/ssl_wallet/)))")
.option("password", "password")
.option("javax.net.ssl.trustStore","/path/to/cwallet.sso")
.option("javax.net.ssl.trustStoreType","SSO")
.option("dbtable",QUERY)
.option("driver", "oracle.jdbc.driver.OracleDriver").load
But when we are trying to run it using the spark-submit
command we are getting the below error:
Exception in thread "main" java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:774)
at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:688)
...
...
...
Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:523)
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:521)
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:660)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:286)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1438)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:518)
... 28 more
Caused by: oracle.net.ns.NetException: Unable to initialize ssl context.
at oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:597)
at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:143)
at oracle.net.nt.ConnOption.connect(ConnOption.java:161)
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:470)
... 33 more
Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:642)
at oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:580)
... 36 more
Caused by: java.security.KeyStoreException: SSO not found
at java.security.KeyStore.getInstance(KeyStore.java:851)
at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:628)
... 37 more
Caused by: java.security.NoSuchAlgorithmException: SSO KeyStore not available
at sun.security.jca.GetInstance.getInstance(GetInstance.java:159)
at java.security.Security.getImpl(Security.java:695)
at java.security.KeyStore.getInstance(KeyStore.java:848)
I am very new to spark and might be doing something wrong here. This is how I am trying to configure the Config
val conf = new SparkConf().setAppName(JOB_NAME)
conf.set("javax.net.ssl.trustStore", "/path/to/cwallet.sso");
conf.set("javax.net.ssl.trustStoreType", "SSO")
conf.set("oracle.net.tns_admin", "/path/to/tnsnames.ora")
conf.set("oracle.net.wallet_location", "(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/ssl_wallet/dir/)))")
conf.set("user", "user")
conf.set("password", "pass")
Below is the spark-submit
command used
spark-submit --class fully.qualified.path.to.main \
--jars /path/to/ojdbc8.jar,/path/to/oraclepki.jar,/path/to/osdt_cert.jar,/path/to/osdt_core.jar \
--deploy-mode client --files /path/to/hive-site.xml --master yarn \
--driver-memory 12G \
--conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO" \
--executor-cores 4 --executor-memory 12G \
--num-executors 20 /path/to/application.jar /path/to/application_custom_config.conf
Also tried to add
--conf 'spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO'
and
--files /path/to/cwallet.sso,/path/to/tnsnames.ora
to the spark-submit
command but without any luck.
What exactly I am doing wrong here? Also tried the solution mentioned in this post but getting the same error.Do I need to make sure trustStore should be accessible on each executor node
? If that is the case then why the spark-shell
command is working fine ? Does this mean spark-cli does not include any worker nodes to execute the command ?
Please advice
UPDATE:
It looks like you're using the JDBC driver from 12.1.0.2. Please upgrade to 18.3 which you can download from oracle.com/technetwork/database/application-development/jdbc/… Some changes have been made to make the use of wallets easier. -- @Jean de Lavarene
After following the suggested change by @Jean de Lavarene got rid of the initial error but below is what I am getting now
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, example.server.net, executor 2): java.sql.SQLException: PKI classes not found. To use 'connect /' functionality, oraclepki.jar must be in the classpath: java.lang.NoClassDefFoundError: oracle/security/pki/OracleWallet
at oracle.jdbc.driver.PhysicalConnection.getSecretStoreCredentials(PhysicalConnection.java:3058)
at oracle.jdbc.driver.PhysicalConnection.parseUrl(PhysicalConnection.java:2823)
When I run this in spark local mode : --master local[*]
it works fine but failing with yarn
mode.
I am already using the --jars
command with comma separated list of jars. What I found is :
1) --jars
expect the path to be a local one, and then it copies them to the HDFS path
2) using file:///
at the beginning is not working
3) If I do not specify the --jars
parameter the program is asking for missing JDBC driver class. Once I specify the ojdbc8.jar
using --jars then the errors go away and start giving the oraclepki.jar
not found error. I have NO CLUE why this is happening.
4) Also tried using :
as the separator while specifying multiple jars but without any luck
UPDATE 2
I was able to resolve the oraclepki.jar
not found exception by using the
--driver-class-path /path/to/oraclepki.jar:/path/to/osdt_cert.jar:/path/to/others.jar
but once we are running into the --master yarn
mode then the following exception is being shown
Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:617)
at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:322)
... 32 more
Caused by: java.io.FileNotFoundException: /path/to/cwallet.sso (No such file or directory)
As per my understanding it looks like when it launching the job from the worker node the cwallet.sso
file path is not available on those nodes. We tried to specify a HDFS path for the wallet but the utility expects a local path to be provided when creating the wallet.
So do we need to manually copy the wallet file to all worker nodes ? Or is there any better alternatives to achieve this?
Please advice
Observations and Notes:
This error says
KeyStore
, as opposed toTrustStore
. But that's likely just misleading terminology and yet, error messages of both the java and oracle classes use the same term.Root CA certs go in
TrustStore
, but if you were using client keys, you would need theseKeyStore
properties:The Oracle JDBC SSL manual itself contributes to the terminology confusion since it never even addresses the
TrustStore
:Based on the above alternative, if you just specify the wallet property, you can apparently omit all the
javax.net.ssl
lines?As you found, using an Oracle 12.1 client to connect to an Oracle 18+ database is problematic with SSL. Driver incompatibility issues are showing up as obscure errors (even though this old driver is supposedly "forward compatible".
Oracle 12c drivers don't support all the enhanced features in connection strings (Easy Connect Plus), but some connection properties like
wallet_location
were added in the 19c JDBC driver. So using a fully specified self-contained ezconnect connection URL like that may make things easier.Another option is to use Java JKS TrustStore instead of Oracle Wallets.
If you still see issues, use
javax.net.debug=all
to capture tracing details.Always test your connections with
tnsping
andsqlplus
before testing in pyspark.