We are using C++ library librdkafka version 2.2.0 for producer and consumer
Apache Kafka brokers are installed on RHEL machines and brokers version – 2.7
From producer side , we verified that producer send sequential offsets to the relevant topic partition as the following
==> KAFKA Logger: Message delivered , 202 bytes topic data.pipe.test, partition 12,offset 4736663
==> KAFKA Logger: Message delivered , 202 bytes topic data.pipe.test, partition 12,offset 4736664
==> KAFKA Logger: Message delivered , 202 bytes topic data.pipe.test, partition 12,offset 4736665
==> KAFKA Logger: Message delivered , 202 bytes topic data.pipe.test, partition 12,offset 4736666
.
.
.
But from consumer side we can see that consumer read from the same topic partition - data.pipe.test
partition number – 12
, but not sequential offsets as the following
actually this behavior that consumers not read the offset sequentially caused loosing data from the topic partition/s , but we are not sure if consumer log really report the right status
==> KAFKAConsumer getMessage from topic data.pipe.test, partition 12, at offset 4084746 <====
==> KAFKAConsumer getMessage from topic data.pipe.test, partition 12, at offset 4084798 <====
==> KAFKAConsumer getMessage from topic data.pipe.test, partition 12, at offset 4084812 <====
.
.
.
Consumer group name is – data_collection_group
( also we can see consumer group details from kafka-consumer-groups.sh --bootstrap-server kafka1:9093 --group data_collection_group --describe
)
Note - topic data.pipe.test
created with 70 partitions
From librdkafka documentation : https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md
So based on above , I want to ask if from Kafka cluster side or by Kafka tools as following link example - https://strimzi.io/blog/2021/12/17/kafka-segment-retention/
we can verify/validate that all messages are consumed/read correctly or maybe consumer missed some offsets
additionally I want to say:
Kafka consumer lag is a key performance indicator for the popular Kafka streaming platform.
All else equal, lower consumer lag means better Kafka performance. The table below - https://redpanda.com/guides/kafka-performance/kafka-consumer-lag
summarizes common causes of Kafka consumer lag. We will explore these causes in more detail later in this article.
so is it right to look on Kafka consumer LAG as indication about consumers that not read the data ?