I have one Spark (version 1.3.1)
application. In which, I am trying to convert one Java bean RDD
JavaRDD<Message>
into Dataframe, it has many fields with different-different Data types (Integer, String, List, Map, Double).
But when, I am executing my Code.
messages.foreachRDD(new Function2<JavaRDD<Message>,Time,Void>(){
@Override
public Void call(JavaRDD<Message> arg0, Time arg1) throws Exception {
SQLContext sqlContext = SparkConnection.getSqlContext();
DataFrame df = sqlContext.createDataFrame(arg0, Message.class);
df.registerTempTable("messages");
I got this error
/06/12 17:27:40 INFO JobScheduler: Starting job streaming job 1434110260000 ms.0 from job set of time 1434110260000 ms
15/06/12 17:27:40 ERROR JobScheduler: Error running job streaming job 1434110260000 ms.1
scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465)
If
Message
has many different fields likeList
and the error message points to aList
match error than that is the is the issue. Also if you look at the source code you can see thatList
is not in the match.But beside digging around in the source code this is also very clearly stated in the documentation here under the Java tab:
You may want to switch to Scala as it seems to be supported there:
So the solution is either to use Scala or remove the
List
from you JavaBean.As a last resort you can take a look at SQLUserDefinedType to define how that
List
should be persisted, maybe it's possible to hack it together.