Spark Launcher: Can't see the complete stack trace for failed SQL query

Question

Spark Launcher: Can't see the complete stack trace for failed SQL query

851 Views Asked by sbrk At 25 May 2020 at 11:22

I'm using SparkLauncher to connect to Spark in cluster mode on top of Yarn. I'm running some SQL code using Scala like this:

def execute(code: String): Unit = {
    try {
      val resultDataframe = spark.sql(code)
      resultDataframe.write.json("s3://some/prefix")
    catch {
      case NonFatal(f) =>
        log.warn(s"Fail to execute query $code", f)
        log.info(f.getMessage, getNestedStackTrace(f, Seq[String]()))
    } 
}

def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
   if (e.getCause == null) return msg
   getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}

Now when I run a query that should fail with the execute() method, for example, querying a partitioned table without a partitioned predicate - select * from partitioned_table_on_dt limit 1;, I get an incorrect stack trace back.

Correct stack trace when I run spark.sql(code).write.json() manually from spark-shell:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *(1) LocalLimit 1
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
...

Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: No partition predicate found for partitioned table
 partitioned_table_on_dt.
 If the table is cached in memory then turn off this check by setting
 hive.mapred.mode to nonstrict
    at org.apache.spark.sql.hive.execution.HiveTableScanExec.prunePartitions(HiveTableScanExec.scala:155)
...

org.apache.spark.SparkException: Job aborted.
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
  at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
...

Incorrect stack trace from the execute() method above:

Job Aborted: 
"org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)",
"org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)",
...

"org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)",
"org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)",
...

The spark-shell stack trace has three nested exceptions SparkException(SemanticException (TreeNodeException)) but the traceback that I'm seeing with my code is only from the SparkException and TreeNodeException but the most valuable SemanticException traceback is missing even after fetching the nested stack traces in the getNestedStackTrace() method.

Can any Spark/Scala experts tell me what am I doing wrong or how do I fetch the complete stack trace here with all the exceptions?

Original Q&A

There are 1 best solutions below

**sbrk** · Accepted Answer · 2020-05-26T09:10:13.603000

The recursive method getNestedStackTrace() had a bug.

def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
   if (e == null) return msg // this should be e not e.getCause  
   getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}

Spark Launcher: Can't see the complete stack trace for failed SQL query

There are 1 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-LAUNCHER

Trending Questions

Popular # Hahtags

Popular Questions