When i try to read data from elasticsearch using the esRDD("index") function in elasticsearch-spark, i get the results in type org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])]. And when i check the values, they are all type AnyRef. However, i saw in on ES site, it says :
elasticsearch-hadoop automatically converts Spark built-in types to Elasticsearch types (and back)
My dependencies are:
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0"
libraryDependencies += "org.elasticsearch" % "elasticsearch-spark-20_2.11" % "5.4.0"
Do I miss something? And how can i convert the types in a convenient way?
OK, I found a solution. If you use
esRDD, all types information are lost.It is better if we use:
You can config es in
option, if you have done it before,optioncan be ignored.The data returend is in
DataFrame, and the data types are preserved (converted tosql.DataTypes) in schema as long as the conversion is supported byelasticsearch-spark.And now you can do whatever you want.