I am writing/reading datas between Spark dataframes and Elasticsearch using the following code :
df.write.format("org.elasticsearch.spark.sql")
.option("es.nodes" , [MY_ES_IP])
.option("es.port",[MY_ES_PORT])
...
.option("es.index.auto.create","true")
.option("es.resouce.auto.create",[INDEX]/[DATA])
.save([INDEX]/[DATA])
and
val df = spark.read.format("org.elasticsearch.spark.sql")
.option("es.nodes" , [MY_ES_IP])
.option("es.port",[MY_ES_PORT])
...
.load([INDEX]/[DATA])
I want to retrieve the result of a significant-terms request on my datas, but I can't find any example on how to achieve this using Spark.
The DSL request I would like to do :
{
"query" : {
"terms" : {[BUCKET REQUEST]}
},
"aggregations" : {
"significant_elements" : {
"significant_terms" : { "field" : [FIELD NAME] }
}
}
}
Is there any way to achieve that using only the org.elasticsearch.spark.sql lib?
EDIT :
I tried solving the problem with :
val myquery = "{\"query\" : {\"terms\" : [BUCKET REQUEST]},\"aggregations\" : {\"significant_elements\" : {\"significant_terms\" : { \"field\" : [FIELD NAME]}}}}"
val df = spark.read.format("org.elasticsearch.spark.sql")
.option("es.nodes" , [MY_ES_IP])
.option("es.port",[MY_ES_PORT])
...
.option("query", myquery)
.option("pushdown", "true")
.load([INDEX]/[DATA])
But the dataframe I get in result is the result of the bucket request only. I'm still looking for how to get the 'significant score' for each [FIELD NAME]