I am doing an elastic search 1.5.2 query with the "explain" flag turned on. The output for the inverse document frequency is
{
"description": "idf(docFreq=2, maxDocs=56)",
"value": 3.9267395
}
I understand the idea behind inverse document frequency. If I have 100 docs and one includes the word "rhododendron" then the idf = num docs / num docs with term "rhododendron" = 100 / 1
But where is the max docs number coming from in Elastic Search? I don't see anything in the documentation.
maxDocs
is computed by Lucene'sIndexReader
and the API documentation states the following:In other words,
maxDocs
is the total number of documents in the index (+1), including the deleted ones.We can confirm this by looking at the source code for IndexReader, which basically shows that the following formula holds true:
numDeletedDocs() = maxDoc() - numDocs()
, wherenumDeletedDocs()
returns the total number of deleted documents in the indexnumDocs()
returns the number of visible documents in the indexIt is also worth noting, though, that depending on which shard (primary or replica) is hit by your query,
maxDocs
can differ (and hence your score, too). See this thread for a full explanation. To palliate this problem (called "bouncing results"), you can specify thepreference
parameter in your queries.