KafkaSpout multithread or not

77 Views Asked by At

kafka 0.8.x doc shows how to multithread in kafka consumer:

Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, new Integer(a_numThreads));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);


// now launch all the threads
//
executor = Executors.newFixedThreadPool(a_numThreads);

// now create an object to consume the messages
//
int threadNumber = 0;
for (final KafkaStream stream : streams) {
    executor.execute(new ConsumerTest(stream, threadNumber));
    threadNumber++;
}

But KafkaSpout in storm seems to not multithread. Maybe use multi task instead of multithread in KafkaSpout :

builder.setSpout(SqlCollectorTopologyDef.KAFKA_SPOUT_NAME, new KafkaSpout(spoutConfig), nThread);

Which one is better? Thanks

1

There are 1 best solutions below

0
On

Since you mentioned Kafka 0.8.x, I am assuming the KafkaSpout you use is from storm-kafka other than storm-kafka-client.

The first code snippet is high-level consumer's API which could employ multiple threads to consume multiple partitions.

As for the kafka spout, it's probably the same, but Storm is using the low-level consumer, namely SimpleConsumer. However, there will be one SimpleConsumer instance created for each spout executor(task).