Background: 2 out of 3 of our Zookeeper machines snapped which caused my SOLR system to collapse. I added new Zookeeper machines and updated each SOLR machine config with the new Zookeeper machine.
After this, I started the SOLR and used admin page to query *:* which returned a different number every time I queried the pool.
So, I purged all the records in SOLR cloud and ran a batch-job to populate all the data again from Oracle to SOLR. (Everything looked good).
Problem: I have a daily batchjob which updates the SOLR with DELTA(Inserts + Updates) from Oracle.
Since this instance. The number in SOLR pool are not matching the DELTA(insert + update). For ex: even though 1000 records were updated or inserted on a day, SOLR counts differ by more than 10000.
The numbers returned by *:* are not matching. We have tried purging the records multiple time. Things look good when we insert the record for the first time after purge but as soon as updates start happening, numbers don't match.
There are no Duplicate Records. and If I query for a specific record we get the correct record but facet numbers are wrong too.
Is the index file is corrupted?
Try optimizing your index. I was also facing the same issue and optimizing the index fixed it.
Some more info on optimize:
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations
PS: Note that an optimize is expensive. You should not run it more than once daily.