standalone spark cluster unable to read hdfs from executor on kerberized cluster

181 Views Asked by Sudhir K Bansal At 08 September 2022 at 12:06

Spark 3.x or 2.3 HDP 2.7 After enabling kerberos on HDP cluster, spark standalone cluster is unable to read data from hdfs and hive external tables, it is able to read metastore of hive. If I run this spark-submit with master local, it reads data from hdfs and hive but after specifying master i.e spark-submit --master spark://:7077 abc.py

ERROR:

An error occurred while calling o29.csv.
: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "node2.hadoop.com/10.x.x.x"; destination host is: "master1.hadoop.com":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    at org.apache.hadoop.ipc.Client.call(Client.java:1413)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:776)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
    at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
    at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:47)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:376)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:796)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:688)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:651)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
    at org.apache.hadoop.ipc.Client.call(Client.java:1452)
    ... 35 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:561)
    at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:376)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:726)
    ... 38 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:162)
    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:189)
    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
    ... 47 more

HDP Nodes master1 and node1 spark nodes node2(master) & node1(worker)

tried following ways: spark-submit --master spark://node2:7077 --deploy-mode client abc.py

spark-submit --master spark://node2:7077 --keytab /tmp/sudhir.keytab --principal [email protected] ~/abc.

spark-submit --master spark://node2:7077 --deploy-mode client --conf spark.kerberos.keytab=/tmp/sudhir.keytab --conf [email protected] ~/abc.py

with this also --conf spark.kerberos.access.hadoopFileSystems=hdfs://master1:8020

Original Q&A

standalone spark cluster unable to read hdfs from executor on kerberized cluster

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in KERBEROS

Related Questions in HDP

Trending Questions

Popular # Hahtags

Popular Questions