Not able to connect Oracle with Apache Spark using SSO Wallet

3.4k Views Asked by At

We are trying to connect to a remote Oracle database running as AmazonRDS using SSO wallet configured at our end and Apache Spark. We are able to load the data using the spark-shell utility as mentioned below

Start the spark shell with jdbc and oraclepki jar added to the classpath

 spark-shell --driver-class-path /path/to/ojdbc8.jar:/path/to/oraclepki.jar

This is the JDBC url used:

 val JDBCURL="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=www.example.aws.server.com)(PORT=1527))(CONNECT_DATA=(SID=XXX))(SECURITY = (SSL_SERVER_CERT_DN =\"C=US,ST=xxx,L=ZZZ,O=Amazon.com,OU=RDS,CN=www.xxx.aws.zzz.com\")))"

And below is the Spark jdbc call to load the data

 spark.read.format("jdbc").option("url",JDBCURL)
.option("user","USER")
.option("oracle.net.tns_admin","/path/to/tnsnames.ora")
.option("oracle.net.wallet_location","(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/ssl_wallet/)))")
.option("password", "password")
.option("javax.net.ssl.trustStore","/path/to/cwallet.sso")
.option("javax.net.ssl.trustStoreType","SSO")
.option("dbtable",QUERY)
.option("driver", "oracle.jdbc.driver.OracleDriver").load    

But when we are trying to run it using the spark-submit command we are getting the below error:

    Exception in thread "main" java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
    at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:774)
    at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:688)
    ...
    ...
    ...

    Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection
    at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:523)
    at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:521)
    at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:660)
    at oracle.net.ns.NSProtocol.connect(NSProtocol.java:286)
    at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1438)
    at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:518)
    ... 28 more
    Caused by: oracle.net.ns.NetException: Unable to initialize ssl context.
    at oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:597)
    at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:143)
    at oracle.net.nt.ConnOption.connect(ConnOption.java:161)
    at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:470)
    ... 33 more
    Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
    at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:642)
    at oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:580)
    ... 36 more
    Caused by: java.security.KeyStoreException: SSO not found
    at java.security.KeyStore.getInstance(KeyStore.java:851)
    at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:628)
    ... 37 more
    Caused by: java.security.NoSuchAlgorithmException: SSO KeyStore not available
    at sun.security.jca.GetInstance.getInstance(GetInstance.java:159)
    at java.security.Security.getImpl(Security.java:695)
    at java.security.KeyStore.getInstance(KeyStore.java:848)

I am very new to spark and might be doing something wrong here. This is how I am trying to configure the Config

    val conf = new SparkConf().setAppName(JOB_NAME)
    conf.set("javax.net.ssl.trustStore", "/path/to/cwallet.sso");
    conf.set("javax.net.ssl.trustStoreType", "SSO")
    conf.set("oracle.net.tns_admin", "/path/to/tnsnames.ora")
    conf.set("oracle.net.wallet_location", "(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/ssl_wallet/dir/)))")
    conf.set("user", "user")
    conf.set("password", "pass")

Below is the spark-submit command used

    spark-submit --class fully.qualified.path.to.main \
    --jars /path/to/ojdbc8.jar,/path/to/oraclepki.jar,/path/to/osdt_cert.jar,/path/to/osdt_core.jar \
    --deploy-mode client --files /path/to/hive-site.xml --master yarn  \
    --driver-memory 12G \
    --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO" \
    --executor-cores 4 --executor-memory 12G \
    --num-executors 20 /path/to/application.jar /path/to/application_custom_config.conf

Also tried to add

--conf 'spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO'

and

--files /path/to/cwallet.sso,/path/to/tnsnames.ora

to the spark-submit command but without any luck. What exactly I am doing wrong here? Also tried the solution mentioned in this post but getting the same error.Do I need to make sure trustStore should be accessible on each executor node ? If that is the case then why the spark-shell command is working fine ? Does this mean spark-cli does not include any worker nodes to execute the command ?

Please advice

UPDATE:

It looks like you're using the JDBC driver from 12.1.0.2. Please upgrade to 18.3 which you can download from oracle.com/technetwork/database/application-development/jdbc/… Some changes have been made to make the use of wallets easier. -- @Jean de Lavarene

After following the suggested change by @Jean de Lavarene got rid of the initial error but below is what I am getting now

    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, example.server.net, executor 2): java.sql.SQLException: PKI classes not found. To use 'connect /' functionality, oraclepki.jar must be in the classpath: java.lang.NoClassDefFoundError: oracle/security/pki/OracleWallet
    at oracle.jdbc.driver.PhysicalConnection.getSecretStoreCredentials(PhysicalConnection.java:3058)
    at oracle.jdbc.driver.PhysicalConnection.parseUrl(PhysicalConnection.java:2823) 

When I run this in spark local mode : --master local[*] it works fine but failing with yarn mode.

I am already using the --jars command with comma separated list of jars. What I found is :

1) --jars expect the path to be a local one, and then it copies them to the HDFS path
2) using file:/// at the beginning is not working
3) If I do not specify the --jars parameter the program is asking for missing JDBC driver class. Once I specify the ojdbc8.jar using --jars then the errors go away and start giving the oraclepki.jar not found error. I have NO CLUE why this is happening.
4) Also tried using : as the separator while specifying multiple jars but without any luck

UPDATE 2

I was able to resolve the oraclepki.jar not found exception by using the

    --driver-class-path /path/to/oraclepki.jar:/path/to/osdt_cert.jar:/path/to/others.jar 

but once we are running into the --master yarn mode then the following exception is being shown

    Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
    at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:617)
    at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:322)
    ... 32 more
    Caused by: java.io.FileNotFoundException: /path/to/cwallet.sso (No such file or directory)

As per my understanding it looks like when it launching the job from the worker node the cwallet.sso file path is not available on those nodes. We tried to specify a HDFS path for the wallet but the utility expects a local path to be provided when creating the wallet.

So do we need to manually copy the wallet file to all worker nodes ? Or is there any better alternatives to achieve this?

Please advice

2

There are 2 best solutions below

0
On
Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
Caused by: java.security.KeyStoreException: SSO not found
Caused by: java.security.NoSuchAlgorithmException: SSO KeyStore not available

Observations and Notes:

  1. This error says KeyStore, as opposed to TrustStore. But that's likely just misleading terminology and yet, error messages of both the java and oracle classes use the same term.

  2. Root CA certs go in TrustStore, but if you were using client keys, you would need these KeyStore properties:

    .option("javax.net.ssl.keyStore","/path/to/cwallet.sso")
    .option("javax.net.ssl.keyStoreType","SSO")
    .option("oracle.net.authentication_services","(TCPS)")
    

    The Oracle JDBC SSL manual itself contributes to the terminology confusion since it never even addresses the TrustStore:

    If the oraclepki.jar file is on the CLASSPATH, then the driver can automatically load the Oracle PKI Provider in the following way:

    java –cp oraclepki.jar:ojdbc8.jar –D javax.net.ssl.keyStore=/path/to/wallet/cwallet.sso MyApp
    

    Similarly, for a specified value of the oracle.net.wallet_location connection property, the driver can automatically load the Oracle PKI Provider in the following way:

    java –cp .:oraclepki.jar:ojdbc8.jar –D oracle.net.wallet_location=file:/path/to/wallet/cwallet.sso MyApp
    
  3. Based on the above alternative, if you just specify the wallet property, you can apparently omit all the javax.net.ssl lines?

    .option("oracle.net.wallet_location", "(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/ssl_wallet)))")
    
  4. As you found, using an Oracle 12.1 client to connect to an Oracle 18+ database is problematic with SSL. Driver incompatibility issues are showing up as obscure errors (even though this old driver is supposedly "forward compatible".

  5. Oracle 12c drivers don't support all the enhanced features in connection strings (Easy Connect Plus), but some connection properties like wallet_location were added in the 19c JDBC driver. So using a fully specified self-contained ezconnect connection URL like that may make things easier.

  6. Another option is to use Java JKS TrustStore instead of Oracle Wallets.

  7. If you still see issues, use javax.net.debug=all to capture tracing details.

  8. Always test your connections with tnsping and sqlplus before testing in pyspark.

2
On

Baically this is how we are able to solve it. One important thing to remember here is the SSO file must be present on all nodes where Spark will be running (spark's executor node)

    val SOURCE_DF = spark.read.format("jdbc")
        .option("url", "jdbc:oracle:thin:@...full string here")
        .option("oracle.net.wallet_location", "(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/path/to/sso/dir)))")
        ...
        ...

In case you need to pass additional details you can add more .options parameters

   .option("oracle.net.tns_admin", "oracle/tns/file/path"))
   .option("javax.net.ssl.trustStoreType", "sso")