How lucene deals with deleted document from inverted index?

443 Views Asked by At

I wonder what's the way Lucene to deal with deleted documents.

I know about segments and how lucene mark a document is deleted

Since in the inverted index, each word would be mapped with numerous of document id

for example:

harry: 1,2,3
potter: 9,10,1
half: 1,6,3

to remove document id = 1, lucene have to walk through all those word to strip document id = 1? iterating over them to remove is extreme costly.

1

There are 1 best solutions below

8
Amit On

Lucene has a concept of segments that are immutable , hence when a document is deleted its not deleted from the original segment where its originally created, when its deleted its marked as deleted in the new segment and when Elasticsearch searches documents it searches in all the segements and when it sees the entry in both old and new segment it sees that its deleted in the new segment, so it removes the deleted document from the search result.

And merge segment finally merges old segments and in that process removes the deleted documents and creates a new segment.

Refer merge process for more info.