Publishing on a specific partition of a topic using pykafka

1.6k Views Asked by At

How is it possible in pykafkato publish a message on a specific partition of a topic. In the following piece of code test topic has four partitions and I'm intending to write each message in one of them but apparently it's not working that way.

from pykafka import KafkaClient

import logging
logging.basicConfig()

client = KafkaClient(hosts='localhost:9092')
print client.topics
topic = client.topics['test']
with topic.get_producer() as producer:
        for i in range(4):
                producer.produce('another test message ' + str(i ** 2), partition_key='{}'.format(0))
1

There are 1 best solutions below

0
On

Key is what determines "which partition" a message is going to end up in.
If you don't supply a key, then Kafka puts messages in a round-robin fashion, where each partition gets roughly the same amount of messages.

If you provide the key, then Kafka calculates the hash and puts the message in a resulting partition. You don't exactly have control over which particular partition is going to be used, only that the same key will always end up in the same partition.
Adding key to message is often used to guarantee ordering of some subset of messages. E.g. let's say that you have user and transaction entities and you want to process all transaction that pertain to the same user in order. You would achieve that by using userId as message key.

There's no coordination between partitions (too slow), thus no total ordering when multiple partitions are used. You are guaranteed that messages will be consumed in the same order they were produced only if you place them all in the same partition.

Maybe I should've asked you for your use case first, before writing all this :)