Elasticsearch, why "SELECT * FROM my_index LIMIT 1" takes so long?

137 Views Asked by At

I've a decently sized ES index (10TB) with 50 split on 50 machines (1 shard each), close to 10B rows. Machines are top-tier (the largest you can get on AWS). RAM per ES instance is set to 30 GB.

Whenever I run a very simple query such as :

POST /my_index/_search
{
  "size": 1, 
  "query": {
    "match_all": {}
  }
}

It takes between 2 to 20+ seconds (I even got 502):

Response

{
  "took" : 17584,
  "timed_out" : false,
  "_shards" : {
    "total" : 50,
    "successful" : 50,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {

Is there a way to make it faster? I noticed that the LIMIT clause is not working great in ElasticSearch (or better, I'm probably not using it right)

1

There are 1 best solutions below

6
On

Interesting ES performance question, please clarify a few things, and based on my understanding I will try to explain the things

  1. When you mentioned RAM per ES instance is set to 30 GB, I guess you meant ES heap size is 30 GB, not the ES node's RAM size? as thumb rule is to assign 50% RAM of the node to ES Heap size and it(ES heap) shouldn't cross 32 GB.
  2. Hope you have tried this query during peak and non-peak hours and your range is including both the time-frame?

Now, few recommendations to speed up your search query

  1. Try to reduce the number of shards for your index, currently its 50 shards, this means the coordinating node has to collect the search result from 50 nodes(your case of 1 shard each node)and this inter-node communication might be taking quite some time.

  2. Have written short tips to improve search performance, and see if you can apply to your cluster, index