My _search
requests had been gradually becoming slower and slower to the point of 504 gateway timeouts. Then I saw dozens of super-long running indices:data/read/search
tasks with no end in sight so I tried to cancel them using POST _tasks/_cancel?actions=*search
(note that I only have one node of interest so I didn't need the &node=...
param).
This only resulted in another (cancel
) task being registered and now even my GET _tasks
and GET _cat/tasks?v
requests are timing out.
I'm wondering whether it's possible to
- set a cap on the
running_time_in_nanos
attribute of allsearch
tasks and/or auto-cancel all the exceeding ones - force-cancel the tasks without having to restart the ES service when the Tasks API itself is timing out
Side note: I already have a health-check bash script
if [ `curl -s -m 20 https://my-es-instance | grep "You Know, for Search" | wc -l` -ne 1 ];
then
echo "`date "+%F %T"` app not responding" &>> $my_log_file
...
but it doesn't take into consideration the fact that while the root (GET /
) may be running, the _search
endpoints are not.
What are the best practices here?