Use Elasticsearch significant-tems aggregation with SparkSQL

680 Views Asked by At

I am writing/reading datas between Spark dataframes and Elasticsearch using the following code :

df.write.format("org.elasticsearch.spark.sql")
.option("es.nodes" , [MY_ES_IP])
.option("es.port",[MY_ES_PORT])

   ...

.option("es.index.auto.create","true")
.option("es.resouce.auto.create",[INDEX]/[DATA])
.save([INDEX]/[DATA])

and

val df = spark.read.format("org.elasticsearch.spark.sql")
     .option("es.nodes" , [MY_ES_IP])
     .option("es.port",[MY_ES_PORT])

     ...

     .load([INDEX]/[DATA])

I want to retrieve the result of a significant-terms request on my datas, but I can't find any example on how to achieve this using Spark.

The DSL request I would like to do :

{
    "query" : {
        "terms" : {[BUCKET REQUEST]}
    },
    "aggregations" : {
        "significant_elements" : {
            "significant_terms" : { "field" : [FIELD NAME] }
        }
    }
}

Is there any way to achieve that using only the org.elasticsearch.spark.sql lib?

EDIT :

I tried solving the problem with :

val myquery = "{\"query\" : {\"terms\" : [BUCKET REQUEST]},\"aggregations\" : {\"significant_elements\" : {\"significant_terms\" : { \"field\" : [FIELD NAME]}}}}"

val df = spark.read.format("org.elasticsearch.spark.sql")
                     .option("es.nodes" , [MY_ES_IP])
                     .option("es.port",[MY_ES_PORT])

                     ...

                     .option("query", myquery)
                     .option("pushdown", "true")
                     .load([INDEX]/[DATA])

But the dataframe I get in result is the result of the bucket request only. I'm still looking for how to get the 'significant score' for each [FIELD NAME]

0

There are 0 best solutions below