Elasticsearch : UpdateByQuery API Response returns wrong status

493 Views Asked by At

I am facing issue with UpdateByQuery API while trying to update a document which doesn’t exist in Elastic search

Problem description

  1. We are creating one index for each day like test_index-2020.03.11, test_index-2020.03.12… and we maintain eight days (today’s as well as last week seven days) indexes.

  2. When data arrives (reading one by one or in a bulk from Kafka topic) either we need to update (which may exist in any one of the 8 days indexes) if data already exists with given ID or save it if not exist (to current day index).

The solution, I am trying currently when data arrive one by one:

  • Using UpdateByQuery with an inline script to update the doc

  • If BulkByScrollResponse returns Updated count 0, then save the doc

Issues:

Even if doc doesn’t exist still I can see BulkByScrollResponse returns updated field as non-zero (1,2,3,4…) as follows

BulkIndexByScrollResponse[sliceId=null,updated=1,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s]

Due to this unable to trigger document save request.

How to approach if the bulk of documents (having set of different doc IDs) need to be updated with their respective content with single request? Will I be able to achieve with UpdateByQuery?

Note: Considering the amount of data to be processed per hour we need to avoid multiple hits to Elasticsearch.

Doc ID is in the format of str1:str2:Used:Sat Mar 14 23:34:39 IST 2020

But even if doc doesn't exist still i can see updated count as non zero

Adding couple of more points about the approach i am trying: -In my case there is always only one doc which has to get updated per request, as i am trying to update the doc matching the given ID -We have configured shards and replica as "number_of_shards": 10, "number_of_replicas": 1 -We are going with this approach as we don't know in which index actual doc resides

If there is maximum one document matching then Updated field of the response should not have more than 1

Following are couple of output which i get as a part of response: BulkIndexByScrollResponse[sliceId=null,updated=9,created=0,deleted=0,batches=1,versionConflicts=1,noops=0,retries=0,throttledUntil=0s] BulkIndexByScrollResponse[sliceId=null,updated=10,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s]

0

There are 0 best solutions below