Using Kafka topic for feeding seeds url to Storm Crawler

204 Views Asked by aeranginkaman At 14 October 2020 at 04:51

We want to feed seed URLs from a Kafka topic to the StormCrawler based project. Is there a need to change the Storm Crawler?

Original Q&A

There are 1 best solutions below

Julien Nioche On 14 October 2020 at 08:42

Obviously, you'd need to change the topology a bit and add a KafkaSpout and connect it to the StatusUpdaterBolt; like we do in the ES archetype with the FileSpout. The KafkaSpout will have to generate the same sort of output as the FileSpout for the status stream i.e. URL, metadata and status (with a value of discovered). If that's difficult, you can insert a bolt between the Kafka Spout and the statusupdater bolt to convert from strings to that output

Using Kafka topic for feeding seeds url to Storm Crawler

There are 1 best solutions below

Related Questions in APACHE-KAFKA

Related Questions in APACHE-STORM

Related Questions in STORMCRAWLER

Trending Questions

Popular # Hahtags

Popular Questions