Cancelling long-running Elasticsearch tasks times out

1.9k Views Asked by At

My _search requests had been gradually becoming slower and slower to the point of 504 gateway timeouts. Then I saw dozens of super-long running indices:data/read/search tasks with no end in sight so I tried to cancel them using POST _tasks/_cancel?actions=*search (note that I only have one node of interest so I didn't need the &node=... param).

This only resulted in another (cancel) task being registered and now even my GET _tasks and GET _cat/tasks?v requests are timing out.

I'm wondering whether it's possible to

  1. set a cap on the running_time_in_nanos attribute of all search tasks and/or auto-cancel all the exceeding ones
  2. force-cancel the tasks without having to restart the ES service when the Tasks API itself is timing out

Side note: I already have a health-check bash script

if [ `curl -s -m 20 https://my-es-instance | grep "You Know, for Search" | wc -l` -ne 1 ];
then
  echo "`date "+%F %T"` app not responding" &>> $my_log_file
  ...

but it doesn't take into consideration the fact that while the root (GET /) may be running, the _search endpoints are not.

What are the best practices here?

0

There are 0 best solutions below