Apache Nutch writing crawled docs to rabbitmq

131 Views Asked by At

Currently I have the elastic indexer plugin writing docs in batches to elastic. I now want to write these to a rabbitmq exchange.

I tried writing to the exchange inside the elastic plugin's write method and while this worked when run manually from local it did not work when run in the hadoop cluster.

I've also looked at publish-rabbitmq plugin but this looks event focussed rather than document.

Is there an available plugin to do what I want or do I need to write my own?

1

There are 1 best solutions below

2
On

You're after and indexing plugin similar to https://github.com/apache/nutch/tree/master/src/plugin/indexer-solr but that works for RabbitMQ, at the moment this is not exist. I've done something similar for a client some time ago, but sadly is not open source.

Basically what you need to do is write your own implementation in a class that extends from IndexWriter and just fill the implementation for each method.

Take a look at the indexer-solr, indexer-elastic and https://github.com/apache/nutch/blob/master/src/plugin/indexer-dummy/ which is the simplest and provided exactly as a learning/testing tool.