I'm trying to query elasticsearch
with the elasticsearch-spark
connector and I want to return only few results:
For example:
val conf = new SparkConf().set("es.nodes","localhost").set("es.index.auto.create", "true").setMaster("local")
val sparkContext = new SparkContext(conf)
val query = "{\"size\":1}"
println(sparkContext.esRDD("index_name/type", query).count())
However this will return all the documents in the index.
This is actually on purpose. Since the connector does a parallel query, it also looks at the number of documents being returned so if the user specifies a parameter, it will overwrite it according to the es.scroll.limit setting (see the configuration option).
In other words, if you want to control the size, do so through that setting as it will always take precedence.
Beware that this parameter applies per shard. So, if you have 5 shards you might bet up to fice hits if this parameter is set to 1.
See https://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html