I am creating a system where a front-end service pushes messages to a Kafka 'request' topic and listens on another 'response' topic for some downstream back-end consumer (actually a complex system that eventually pushes back to Kafka) to do processing on the 'request' message and eventually push to the 'response' topic.
I am trying to figure out the most elegant way to make sure that the consumer listens on the appropriate partition and receives the response, and that the back-end pushes to the partition that the front-end consumer is listening on. We always need to ensure that the response goes to the same consumer that produced the initial message.
I have two solutions as of now, but neither is particularly satisfying. Any thoughts or ideas would be greatly appreciated:
- Have each front-end decide which partition it will listen on and pass that partition with the message to the 'request' topic. When the processing on the back-end is finished, it will look at the message's partition member and push to the appropriate partition. An immediate issue here is how to coordinate front-end services so that there's an even distribution on each partition (random assignment?).
- Each message has a correlation ID, a GUID, so for each request to our front-end, we could begin listening on a partition based on hashing the GUID to total number of partitions then push the message to the 'request' topic. The back-end would then look at the correlation ID to determine the appropriate partition to push to. An issue here is that for each request that comes in, the front-end must establish a new consumer on a new partition (is there overhead here?) and potentially will have multiple active consumers on the same partition as well as many active consumers across many partitions.
- Have a single consumer group with equal number of consumers and partitions, then go with a similar approach to (1), but allow Kafka to deal with which consumer is on which partition. But then we need to figure out what happens when rebalancing occurs, especially for messages already in flight in the back-end (as potentially all partitions could change?).
This seems like it should be a common pattern so am wondering how others have solved this.
Please do not use consumers with manually assigned partitions. It can get really messy and it is hard to scale.
Instead of partitions you can use topic per front-end consumer. Each front-end service produces a message containing front-end service's id to
request
topic. Then back-end consumes the message and based on the id produces a response message to a particularunique-front-end-service-response
topic. It can be a good solution if you have a constant number of front-end services. Possible disadvantage is creating a new topic every time you want to add a new front-end service. However it would be a lot easier to maintain than manual partition assignment.Another possible solution could be using a different tool. If Kafka is not mandatory, please rethink your requirements and do a research. Probably there is a tool which fits your needs better than Kafka.