zeppelin: Error while running any operation on resulting dataframe

2.3k Views Asked by At

I am using Zeppelin notebook with Spark1.3.1 and Hadoop version 2.6. I am able to run entire tutorial shipped with it without any issues.

Then I created new notebook to run simple code that fetches data from parquet file stored in HDFS on local machine. Following is code:

val param_alarms = sqlc.parquetFile("hdfs://localhost/ge_alarm/alarm_param")
param_alarms.registerTempTable("Alarm")
param_alarms.count()

This fails with following error message:

param_alarms: org.apache.spark.sql.DataFrame = [PARAMETER_ID: int, PARAMETER_NAME: string, OSM_NAME: string, DESCRIPTION: string, TIMESTAMP: bigint, Value: int]
java.lang.NoSuchMethodError: org.json4s.JsonDSL$.string2jvalue(Ljava/lang/String;)Lorg/json4s/JsonAST$JValue;
    at org.apache.spark.sql.types.StructType$$anonfun$jsonValue$5.apply(dataTypes.scala:1065)
    at org.apache.spark.sql.types.StructType$$anonfun$jsonValue$5.apply(dataTypes.scala:1065)
    at org.json4s.JsonDSL$JsonAssoc.$tilde(JsonDSL.scala:86)
    at org.apache.spark.sql.types.StructType.jsonValue(dataTypes.scala:1065)
    at org.apache.spark.sql.types.StructType.jsonValue(dataTypes.scala:1015)
    at org.apache.spark.sql.types.DataType.json(dataTypes.scala:265)
    at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:404)
    at org.apache.spark.sql.parquet.ParquetRelation2.buildScan(newParquet.scala:437)
    at org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$1.apply(DataSourceStrategy.scala:38)
    at org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$1.apply(DataSourceStrategy.scala:38)
    at org.apache.spark.sql.sources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:107)
    at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:34)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:152)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
    at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1081)
    at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1079)
    at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1085)
    at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1085)
    at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815)
    at org.apache.spark.sql.DataFrame.count(DataFrame.scala:827)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:62)
    at $iwC$$iwC$$iwC.<init>(<console>:64)
    at $iwC$$iwC.<init>(<console>:66)
    at $iwC.<init>(<console>:68)
    at <init>(<console>:70)
    at .<init>(<console>:74)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:582)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:558)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:551)
    at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:277)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

I also tried with adding following dependency:

%dep 
z.reset()
z.load("org.json4s:json4s-native_2.11:3.2.10")

By the way when I try running %sql select * from Alarm i get following error:

java.lang.reflect.InvocationTargetException

No luck with this either. Can someone help?

One on colleagues finally found solution, root cause it incompatible json4s. In case others are facing same problem. It was caused with incompatible json4s. To fix this issue I needed to update following:

Modify zeppelin-server/pom.xml to use swagger-jersey-jaxrs_2.10 version 1.3.11

Modify zeppelin-engine/pom.xml to use org.reflections version 0.9.9-RC1

export MAVEN_OPTS='-Xmx2048m -XX:MaxPermSize=2048m'

Compile for hadoop2.6 and Spark1.3 mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests

1

There are 1 best solutions below

0
On

One on colleagues finally found solution, root cause it incompatible json4s. In case others are facing same problem. It was caused with incompatible json4s. To fix this issue I needed to update following:

Modify zeppelin-server/pom.xml to use swagger-jersey-jaxrs_2.10 version 1.3.11

Modify zeppelin-engine/pom.xml to use org.reflections version 0.9.9-RC1

export MAVEN_OPTS='-Xmx2048m -XX:MaxPermSize=2048m'

Compile for hadoop2.6 and Spark1.3 mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests