Elasticsearch reindex only missing documents

2.2k Views Asked by At

I am trying to reindex an index of 200M of documents from cluster A to cluster B. I used the Reindex API with a remote source and everything worked fine. In the menwhile of my reindex some documents were added into the cluster A so I want to add them as well into the cluster B.

I launched again the reindex request but it seems that the reindex process is taking a lot, like if it was reindexing everything again.

My question is, is the cluster reindexing from scratch all the documents, even if they didn't change ?

My elasticsearch version is the 5.6

Indexing rate

Document deletion rate

1

There are 1 best solutions below

3
On

The elasticsearch does not know there is a change in the documents or not. So it tries to have each document completely in both indices. If you have a field like insert_time in your data, you can use reindex with query to limit the part of index of A to become reindex on B. This will let you use your older reindex and finish it faster. Reindex by query would be something like this:

POST _reindex
{
  "source": {
    "index": "A",
    "query": {
       "range": {
          "insert_time": {
              "gt": "time you want"
      }
    }
  },
  "dest": {
    "index": "B"
  }
}