The code below was built as jar and is executed by using sparkSubmit command through putty. It works fine.

var conf = new SparkConf().setAppName("ABC")

val sc = new SparkContext(conf)

var hiveContext = new HiveContext(sc)

import sqlContext.implicits._ 

sqlContext.sql("query")

But when I run the same code through SparkLauncher it throws an error below, Master - Yarn-Cluster Spark - 1.6

java.io.IOException: Cannot run program "/etc/spark/conf.cloudera.CD-SPARK_ON_YARN-brkvSOzr/yarn-conf/topology.py" (in directory "/data4/yarn/nm/usercache/ppmingusrdev/appcache/application_14823231312_123/container_14866508534534_144_01_000004"): error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:548)
    at org.apache.hadoop.util.Shell.run(Shell.java:504)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
    at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
    at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
    at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
    at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
    at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
    at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$handleAllocatedContainers$2.apply(YarnAllocator.scala:337)
    at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$handleAllocatedContainers$2.apply(YarnAllocator.scala:336)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.deploy.yarn.YarnAllocator.handleAllocatedContainers(YarnAllocator.scala:336)
    at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:236)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:368)
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
1

There are 1 best solutions below

1
On

We were dealing with the same error today after recently upgrading Cloudera Manager to 5.10 In our case it's because of a bug in this version of CM.

It means that the worker node running your Application Master (not the edge node running your driver if in yarn-client mode) does not have the Spark Gateway role and therefore no spark-on-yarn conf directory.

The workaround for us was to give every single node the Spark Gateway role and redeploy client config.

BTW your job should still run, but with reduced or no data locality (so probably much slower).