I am using Elasticsearch's MultiSearch API to make multiple search requests at once for one of my endpoints. My understanding is that these requests are done in parallel, but my endpoint's latency increases with the number of search requests I make through the API (<50). I have two questions:
- Why is this latency increase happening/how does multisearch work behind the scenes? I am new to Elasticsearch, apologies for my lack of knowledge here.
- What are some ways I can improve latency while keeping multisearch?
To provide a more comprehensive answer, it would be good to know your cluster setup.
These requests are indeed done in parallel, but your cluster still has its limits.
What I believe might be happening is that you might not have enough search threads to process that many searches in parallel and your search thread pool start queueing.
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
So for instance, if you issue a MultiSearch query of let's say 10 search queries where each query would hit 15 shards, this means that this whole query will need 150 search threads in total. And if there are other searches running and the cluster doesn't have available search threads - they will start queueing and eventually might reject if the queue grows too big.
What can you do about it?
number_of_shards
of shards, and indices sizes. Reducing thenumber_of_shards
will require fewer search threads. Find a balance betweennumber_of_shards
and index sizes and their doc count. If there are less than 5M documents, keep everything in a single shard, otherwise, try to have shards of 3M-5M documents, e.g. index with 23M documents could use 5 or 6 shards.