ElasticSearch is using too much CPU. Today I found these logs. Is there anyone can help me with this?
[2015-06-24 16:16:52,309][WARN ][cluster.action.shard ] [Bereet] [logstash-2015.06.24][0] received shard failed for [logstash-2015.06.24][0], node[ucXcuxuQQTSz_leAzWq6mQ], [P], s[INITIALIZING], indexUUID [ieIR8uWLQHycnEC_szsNZQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2015.06.24][0] failed to recover shard]; nested: TranslogCorruptedException[translog corruption while reading from stream]; nested: ElasticsearchIllegalArgumentException[No version type match [99]]; ]]
[2015-06-24 16:16:52,332][WARN ][cluster.action.shard ] [Bereet] [logstash-2015.06.24][0] received shard failed for [logstash-2015.06.24][0], node[ucXcuxuQQTSz_leAzWq6mQ], [P], s[INITIALIZING], indexUUID [ieIR8uWLQHycnEC_szsNZQ], reason [master [Bereet][ucXcuxuQQTSz_leAzWq6mQ][iZ23cth9hh5Z][inet[/10.162.41.162:9300]] marked shard as initializing, but shard is marked as failed, resend shard failure]
[2015-06-24 16:16:52,339][WARN ][index.engine ] [Bereet] [logstash-2015.06.24][4] failed to sync translog
[2015-06-24 16:16:52,345][WARN ][indices.cluster ] [Bereet] [[logstash-2015.06.24][4]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2015.06.24][4] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:290)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog corruption while reading from stream
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:72)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:260)
... 4 more
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No version type match [116]
at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
at org.elasticsearch.index.translog.Translog$Create.readFrom(Translog.java:376)
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
... 5 more
For me this happened after a system crash (the disk had enough space).
There's now an official way to fix a corrupted translog using the provided tool elasticsearch-translog, but you may lose unindexed data, so I suggest to backup the translog (i.e. for compliance reasons; maybe with enough effort someone will want to analyze it at some point).
First confirm the issue by running:
An easier way to find affected shards:
Stop eleasticsearch.
Let's say shard number 2 is affected in index qABaulDIRJyT06G3rBFfrC (your paths may vary), run:
Make sure the newly created files belong to the proper user/group in case you ran the tool as root:
Start elasticsearch. Finally run the following command to force elasticsearch to fix the issue if it stopped attempting to reuse the shard:
The command to view unassigned shards should not return results anymore.