Search elastic engine keeps getting code killed. How do I fix it?

717 Views Asked by At

I have elastic search set up with a flask server running a gunicorn instance. My problem is elastic search keeps crashing every few hours and I have to restart it.

Here's what the service status looks like:

● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: enabled)
   Active: failed (Result: signal) since Mon 2021-06-21 06:06:15 UTC; 3h 8min ago
     Docs: https://www.elastic.co
  Process: 17808 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=KILL)
 Main PID: 17808 (code=killed, signal=KILL)
    Tasks: 0 (limit: 2313)
   CGroup: /system.slice/elasticsearch.service

Jun 21 04:56:51 ip-XXX-XX-XX-XXX systemd[1]: Starting Elasticsearch...
Jun 21 04:57:27 ip-XXX-XX-XX-XXX systemd[1]: Started Elasticsearch.
Jun 21 06:06:15 ip-XXX-XX-XX-XXX systemd[1]: elasticsearch.service: Main process exited, code=killed, status=9/KILL
Jun 21 06:06:15 ip-XXX-XX-XX-XXX systemd[1]: elasticsearch.service: Failed with result 'signal'.

And this is what the flask error log looks like:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 251, in perform_request
    response = self.pool.urlopen(
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/urllib3/util/retry.py", line 386, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/ubuntu/anaconda3/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/ubuntu/anaconda3/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/ubuntu/anaconda3/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/ubuntu/anaconda3/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/home/ubuntu/anaconda3/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f016009b280>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/flask_restful/__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/flask/views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/flask_restful/__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/ubuntu/Automation/searchengine/engine/query.py", line 71, in post
    res = Item().find(clean_query,
  File "/home/ubuntu/Automation/searchengine/engine/models.py", line 136, in find
    res = s.execute()
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/elasticsearch_dsl/search.py", line 715, in execute
    self, es.search(index=self._index, body=self.to_dict(), **self._params)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/elasticsearch/client/utils.py", line 153, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/elasticsearch/client/__init__.py", line 1662, in search
    return self.transport.perform_request(
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/elasticsearch/transport.py", line 413, in perform_request
    raise e
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/elasticsearch/transport.py", line 381, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 264, in perform_request
    raise ConnectionError("N/A", str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f016009b280>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f016009b280>: Failed to establish a new connection: [Errno 111] Connection refused)

What can I do to fix it? It would occasionally crash sometime ago but now it crashes every few hours. What can be the possible issue? Please let me know if I need to add any more details.

EDIT: Elastic search logs that I think are capturing the error:

[2021-06-21T12:45:02,145][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [ip-XXX-XX-XX-XXX] Active license is now [BASIC]; Security is disabled
[2021-06-21T12:45:02,198][INFO ][o.e.g.GatewayService     ] [ip-XXX-XX-XX-XXX] recovered [3] indices into cluster_state
[2021-06-21T12:45:15,401][INFO ][o.e.c.r.a.AllocationService] [ip-XXX-XX-XX-XXX] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[m][0], [dev_m][0]]]).
[2021-06-21T13:29:15,959][INFO ][o.e.m.j.JvmGcMonitorService] [ip-XXX-XX-XX-XXX] [gc][2570] overhead, spent [399ms] collecting in the last [1s]
[2021-06-21T13:40:03,249][WARN ][o.e.m.j.JvmGcMonitorService] [ip-XXX-XX-XX-XXX] [gc][young][3215][14] duration [1.9s], collections [1]/[2.8s], total [1.9s]/[2.7s], memory [160.1mb]->[93.7mb]/[980mb], all_pools {[young] [68mb]->[0b]/[0b]}{[old] [72.1mb]->[90.1mb]/[980mb]}{[survivor] [20mb]->[3.6mb]/[0b]}
[2021-06-21T13:40:03,257][WARN ][o.e.m.j.JvmGcMonitorService] [ip-XXX-XX-XX-XXX] [gc][3215] overhead, spent [1.9s] collecting in the last [2.8s]
0

There are 0 best solutions below