Find number of already exist documents in solr with solrindexing job in nutch

25 Views Asked by At

In nutch, In solrindex job how we can calculate the number of documents which have been updated in solr and the number of documents which have been indexed as new documents.

1

There are 1 best solutions below

0
On

You can use this to see stats and status (fetched, not_modified, gone...)

bin/nutch readdb crawl/crawldb/ -stats

Or else you can dump crawldb to see all urls that have been crawled with their status

bin/nutch readdb crawl/crawldb/ -dump whole_db
vi whole_db/part-r-00000