I am running simplest Driver alone long running job to reproduce this error
Hadoop Version 2.7.3.2.6.5.0-292
Spark-core version 2_11.2.3.0.2.6.5.0-292
Code:
FileSystem fs = tmpPath.getFileSystem(sc.hadoopConfiguration())
log.info("Path {} is ",path,fs.exists(tmpPath);
Behaviour: My job runs without any problem for ~17-18 hours, After that new keys are released as a part of
HadoopFSDelagationTokenProvider
and job continued to run with newly issued delegated tokens, But within next 1hour of delegation token renew, job fails with error token can't be found in cache. I have gone ahead and generated my own dfs.adddelegationtoken programatically for involved namenodes, and i see same behaviour.Question:
- What are the chances that delegation token gets removed from server and what properties controls this?.
- What server side logs shows this token is about get removed or removed from cache.
Path /test/abc.parquet is true
Path /test/abc.parquet is true
INFO Successfully logged into KDC
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,[email protected](auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615466 for qa_user on ha:hdfs:hacluster
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,[email protected](auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615467 for qa_user on ha:hdfs:hacluster
INFO writing out delegation tokens to hdfs://abc/user/qa/.sparkstaging/application_121212.....tmp
INFO delegation tokens written out successfully, renaming file to hdfs://.....
INFO delegation token file rename complete(org.apache.spark.deploy.yarn.security.AMCredentialRenewer)
Scheduling login from keytab in 64799125 millis
Path /test/abc.parquet is true
Path /test/abc.parquet is true
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 31615466 for qa_user) can't be found in cache
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy13.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:620)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
FYI submitted in yarn-cluster-mode with: --keytab /path/to/the/headless-keytab, --principal principalNameAsPerTheKeytab --conf spark.hadoop.fs.hdfs.impl.disable.cache=true Note that Token renewer is issuing new keys and new keys are working too, But it;s somehow gets revoked from server, AM logs doesn't have any clue on the same.