Apache Kafka - Partition

150 Views Asked by At

I have been working on kafka for quite six months and i have some question regarding consumer lag and data stored to partitions in a topic.

Question Number 1: initially when i started reading about kafka and getting know about the functionality how to use , i was in revolving taught that a topic with one portion and one replication factor would do wonders. After a quite six months of work moving my project to live the consumer which consumes my messages from topic starts to give me a lag. I read many stack overflow answers for consumer lag and came to a conclusion that if i increase my partitions and replication factor for a topic would remove it. My real wondering is that would this really clear my consumer lag after six months of flow of data to a topic , could someone help me removing this lag in my broker. thanks in advance.

Question number 2: For an instance if i increase my portions and replication factor for a topic , so from now on how my producer would put the data to the topic , before it was one partition and the data would be flooding the same and my consumer group has only one consumer and that is the default one and that would take from only one partition. will my data be distributed among the partitions of topic (i.e) first message in one portion and the next message in another partition. one more thing would i require any changes in the consumer side also like many consumers i have to start to read data from topics so that the order is followed for me (i.e) i have to get my data in the order i have published to topics...

If someone could give a clear solution to both the problems i face would be good. Thanks in advance.

1

There are 1 best solutions below

2
On BEST ANSWER

If your consumer has lag, than you produce (push) to topic faster than you read. Increasing count of partitions helps you to run several consumers in parallel. For example, if you have 16 partitions and 4 consumers (with the same group id), then each consumer will read 4 partitions. That decrease amount of data that should be processed by one consumer (in the best case in 4 times).

When you push message to kafka, you can specify key. Based on that key, kafka consumer decide to which partitions message should go.

return Utils.abs(Utils.murmur2(record.key())) % numPartitions;

In case you do not specify key, messages will be evenly spread among all partitions. So, if you need to have order (for example per user), you can set key as a user id. In that case all messages for one user will be always in one partitions and in the order you pushed them.