Kerberos Cross Realm HDFS Access Via Spark Application

Question

Kerberos Cross Realm HDFS Access Via Spark Application

3.2k Views Asked by Pramod GM At 27 July 2025 at 17:26

We are trying to do data transfer between two clusters which are enabled with cross-realm authentication using the MIT KDC and Ranger.

DistCP is working without any issues.But, the Spark application in cluster A which is supposed to write data to cluster B HDFS (kerberised) does not work.

The application works in the LOCAL Mode and the writes to cluster B's HDFS works. But when we are trying to run the same in YARN-CLUSTER mode its failing with the

AcessControlException (Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]).

On Debugging it is observed that the FileSystem Object created has SIMPLE authentication in case of yarn-cluster mode and the pricipal is just the username and KERBEROS Authentication in LOCAL mode and principal is the proper principal.

We understand that Yarn delegates token to start Executors and Driver. We are not sure if we are missing any configuration in spark or hdfs or in yarn.

Since its a Cross-Realm KDS's I am using cluser A's Principal and Keytab for Submitting the spark-application.

below the some properties which we have enabled in spark submit.

SPARK:

spark.yarn.access.namenodes=hdfs://mycluster02
spark.authenticate=true
spark.yarn.access.hadoopFileSystems=hdfs://mycluster02
[email protected]
spark.yarn.keytab=user.keytab

YARN:

hadoop.registry.client.auth=kerberos
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled=false

and all other yarn kerberos principals as keytabs are also set.

HDFS:

hadoop.security.authentication=kerberos
and all basic configuraion on kerberos enabling.

Below is the same code replicated from the application which runs on executor to create file System object.

Configuration conf = new Configuration();
conf.addResource(new Path(args[0] + "/core-site.xml"));
conf.addResource(new Path(args[0] + "/hdfs-site.xml")); list and do distcp
conf.set("hadoop.security.authentication", "kerberos");
FileSystem fs = FileSystem.get(conf);
FileStatus[] fsStatus = fs.listStatus(new Path("/"));

spark-submit --name "HDFS_APP_DATA" --master yarn-cluster --conf "spark.yarn.access.namenodes=hdfs://mycluster02" --conf "spark.authenticate=true" --conf "spark.yarn.access.hadoopFileSystems=hdfs://mycluster02" --conf "[email protected]" --conf "spark.yarn.keytab=/home/user/hdfs_test/user.princ.keytab" --driver-memory 2g --executor-memory 3g --num-executors 1 --executor-cores 1 --class com.test.spark.kafka.batch.util.HDFSApp spark-batch-util-jar-with-dependencies.jar /config_files/

Exception:

18/05/23 16:15:38 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hostname.org, partition 1,PROCESS_LOCAL, 2092 bytes)
18/05/23 16:15:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hostname.org): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "hostname.org/xx.xx.xx.xx"; destination host is: "hostname.org":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:785)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
    at org.apache.hadoop.ipc.Client.call(Client.java:1498)
    at org.apache.hadoop.ipc.Client.call(Client.java:1398)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at com.sun.proxy.$Proxy12.getListing(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:625)
    at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
    at com.sun.proxy.$Proxy13.getListing(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2126)
    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:919)
    at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992)
    at com.example.spark.kafka.batch.util.HDFSApp$1.call(HDFSApp.java:51)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:332)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$34.apply(RDD.scala:919)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$34.apply(RDD.scala:919)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1857)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:720)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:683)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:770)
    at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 34 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:758)
    ... 37 more

Original Q&A

Kerberos Cross Realm HDFS Access Via Spark Application

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in HDFS

Related Questions in HADOOP-YARN

Related Questions in KERBEROS

Related Questions in MIT-KERBEROS

Trending Questions

Popular # Hahtags

Popular Questions