So I have cluster node on Google Kubernetes Engine and I do spark-submit to run some spark job. (I didn't use spark-submit exactly, I launch the submit using java code, but they are essentially invoking the same Scala class, which is SparkSubmit.class)
And in my case, I have two clusters I can connect with on my laptop by using the gcloud command.
e.g.
gcloud container clusters get-credentials cluster-1gcloud container clusters get-credentials cluster-2
when I connect to cluster-1, and spark-submit is submitting to cluster-1, it works. But when I ran the second gcloud command and still submitting to cluster-1, it won't work, and the following stack track appears (abridged version)
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:194)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
I've been searching for a while without success. The main issue is probably when spark-submit launches, it searches for some sort of credential on the local machine relating to Kubernetes, and the changing context by previous two gcloud command messed it up.
I'm just curious, when we do spark-submit, how exactly does the remote K8s server knows who I am? What's the auth process involved in all this?
Thank you in advance.
A
PKIX path building failederror means Java tries to open an SSL connection, but was unable to find a chain of certificates (the path) that validates the certificate the server offered.The code you're running from does not trust the certificate offered by the cluster. The clusters are probably using self-signed certificates.
Run from the command line, Java looks for the chain in the truststore located at jre/lib/security/cacerts. Run as part of a larger environment (Tomcat, Glassfish, etc) it will use that environment's certificate truststore.
Since you started spark_submit manually, you're likely missing an option to specify where to find the keystore (server certificate and private key) and truststore (CA certificates). These are usually specified as:
If you're running on Java 9+, you will also need to specify the StoreType:
Up through Java 8, the keystores were always JKS. Since Java 9 they can also be PKCS12.
In the case of a self-signed key, you can export it from the keystore and import it into the truststore as a trusted certificate. There are several sites with instructions for how to do this. I find Jakob Jenkov's site to be quite readable.