The Confluent-Kafka Python Client provides multiple ways to commit topic offsets:
- Using a message (using the offset value associated with it)
- Using a
TopicPartitionstructure (which also has an associated offset) - Using
commit()with no arguments, which (presumably) uses the value of the offset from the consumer
When using the last option, there is little opportunity for anything to go wrong, because the broker is tracking the offset value. (Actually, I am assuming the broker tracks the offset value. The alternative would be the consumer tracks the latest offset it has read. It doesn't actually matter where that data lives, the point is it is an automated part of the API.)
When using the first option, it would seem that message ordering is important.
If we write a program which consumes a number of messages, processes some of them, and retains the others, we might want to commit just some of the offsets.
- It only makes sense to commit an offset if all messages with smaller offsets have been consumed and processed
Given the above condition, the question then becomes
- Does the order in which messages are committed matter?
If we commit message with offset=101, followed by message with offset=100, what effect does this have? Will the stored offset be "rewound" back to 100? Or does the broker see a request to commit a smaller value and ignore it?
It appears to matter what order the messages are committed in.
I wrote a test script. Here are three scripts:
If I run these scripts in order, the final script starts consuming messages from the second produced message.
In other words, if the messages produced are:
The final consumer will consume:
Not:
Inverting the order of the message commits confirms the behaviour, in which case the final consumer will consume:
Code: