We have a metadata-and-url feed and a content feed in our project. The indexing behaviour of the documents submitted using either feed is completely unpredictable. For the content feed, the documents get removed from the index after a random interval every time. For the metadata-and-url feed, the additional metadata we add is ignored, again randomly. The documents themselves do remain in index in the latter case - only our custom metadata gets removed. Basically, it looks like the feeds get "forgotten" by GSA after sometime. What could be the cause of this issue, and how do we go about debugging this?
Points to note: 1) Due to unavoidable reasons, our GSA index is always hovering around the license limit (+/- 1000 documents or so). Could this have an effect? Are feeds purged when nearing license limit? We do have "lock = true" set in the feed records though. 2) These fed documents are not linked to from pages and hence (I believe) would have low page rank. Are feeds automatically purged if not linked to from pages? 3) Our follow patterns include the fed documents. 4) We do not use action=delete with the same documents, so that possibility is ruled out. Also for the content feed we always post all the documents. So they are not removed through feeds.
When you hit the license limit the GSA will start dropping documents from the index so I'd say that's definitely your problem.