Oozie Spark HBase job, invalid credentials exception

843 Views Asked by At

i do have an issue with Kerberos credentials. This work is based on a cluster and the keytabs are provided on each datanode. Basically it is an oozie workflow shell action, and it's purpose is to write to HBase by a spark job. If the job is run on cluster mode without oozie, it works as excpected. But with oozie it throws an Exception as follows:

WARN AbstractRpcClient: Exception encountered while connecting to the server 
: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
18/11/26 15:30:24 ERROR AbstractRpcClient: SASL authentication failed. The 
most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge
(GssKrb5Client.java:211)
at 
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.
saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection
.setupSaslConnection(RpcClientImpl.java:611)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection
.access$600(RpcClientImpl.java:156)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2
.run(RpcClientImpl.java:737)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2
.run(RpcClientImpl.java:734)
at java.security.AccessController.doPrivileged(Native Method)

The oozie shell action looks like:

<action name="spark-hbase" retry-max="${retryMax}" retry-interval="${retryInterval}">
<shell xmlns="uri:oozie:shell-action:0.3">
  <exec>submit.sh</exec>
  <env-var>QUEUE_NAME=${queueName}</env-var>
  <env-var>PRINCIPAL=${principal}</env-var>
  <env-var>KEYTAB=${keytab}</env-var>
  <env-var>VERBOSE=${verbose}</env-var>
  <env-var>CURR_DATE=${firstNotNull(currentDate, "")}</env-var>
  <env-var>DATA_TABLE=${dataTable}</env-var>
  <file>bin/submit.sh</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>

submit.sh file's spark-submit command looks like:

enter code here
CLASS="App class location"
JAR="compiled jar file"

HBASE_JARS="HBase jars"
HBASE_CONF='hbase-site.xml location'

HIVE_JARS="Hive jars"
HIVE_CONF='tez-site.xml location'

HADOOP_CONF='hdfs-site.xml location'

SPARK_BIN_DIR="spark2-client bin directory location"

${SPARK_BIN_DIR}/spark-submit \
  --class ${CLASS} \
  --principal "${PRINCIPAL}" \
  --keytab "${KEYTAB}" \
  --master yarn \
  --deploy-mode cluster \
  --driver-memory 10G \
  --executor-memory 4G \
  --num-executors 10 \
  --conf spark.default.parallelism=24 \
  --jars ${HBASE_JARS},${HIVE_JARS} \
  --files ${HBASE_CONF},${HIVE_CONF},${HADOOP_CONF} \
  --conf spark.ui.port=4042 \
  --conf "spark.executor.extraJavaOptions=-verbose:class - 
  Dsun.security.krb5.debug=true" \
  --conf "spark.driver.extraJavaOptions=-verbose:class - 
  Dsun.security.krb5.debug=true" \
  --queue "${QUEUE_NAME}" \
  ${JAR} \
    --app.name "spark-hbase" \
    --data.table "${DATA_TABLE}" \
    --verbose
1

There are 1 best solutions below

0
Ami Ranjan On

Creating soft link on all the nodes in cluster may not always be feasible. We resolved it by adding hbase configuration directory in spark configuration by overriding SPARK_CONF_DIR environment variable in the shell before the spark-submit command.

export SPARK_CONF_DIR=/etc/spark2/conf:/etc/hbase/conf