Load data from separate kafka cluster to Samza?

229 Views Asked by At

I am trying to create a Samza job that as closely resembles the Wikipedia example job as I can make it. However in the "WikipediaFeed" object I am trying to get data from a different Kafka broker than the Kafka broker that is running when you start the Hello-Samza grid.

Do I have to create a thread safe Kafka consumer inside the "WikipediaFeed" object to consume data from a different Kafka cluster or is there another way I'm not seeing?

Edit 1: Here is a link to their Wikipedia example. https://github.com/apache/samza-hello-samza/tree/master/src/main

Thanks

1

There are 1 best solutions below

2
On

In your example you need change this config (https://github.com/apache/samza-hello-samza/blob/master/src/main/config/wikipedia-feed.properties) :

systems.kafka.consumer.zookeeper.connect=KAFKA_CLUSTER_FRONTING:2181
systems.kafka.producer.bootstrap.servers=KAFKA_CLUSTER_FRONTING:9092
task.inputs=kafka.topic1,kafka.topic2,kafka.topic3

Change the config with your Fronting Kafka cluster and add your topic in task.inputs separated with ","

Edit: Just to be clear, you can deploy your Samza into a Cluster 1 and consume a Kafka topic from another cluster. You need change the config in your Samza properties.

To see more information : Samza config

Then if you need send your message after process to another Kafka cluster you will need create another system in your config.

See more information : https://samza.apache.org/learn/documentation/0.13/api/overview.html