kafka client is sending request to partition where broker which went down

881 Views Asked by At

I am using kafka-node module to send message to kafka. In a clustered environment where I have a topic with 3 partitions and replication factor as 3.

Topic describe is -

Topic:clusterTopic      PartitionCount:3        ReplicationFactor:3    Configs:min.insync.replicas=2,segment.bytes=1073741824
        Topic: clusterTopic     Partition: 0    Leader: 1       Replicas: 1,2,3 Isr: 1,2,3
        Topic: clusterTopic     Partition: 1    Leader: 2       Replicas: 2,3,1 Isr: 1,2,3
        Topic: clusterTopic     Partition: 2    Leader: 3       Replicas: 3,1,2 Isr: 1,2,3

Producer config -

        "requireAcks": 1,
        "attributes": 2,
        "partitionerType": 2,
        "retries": 2

When I send data it follows partition-type as cyclic(2) like round-robin fashion

when I follow below steps

  • Get a HighLevelProducer instance connected to kafka:9092,kafka:9093
  • send a message
  • stop the kafka-server:9092 manually
  • try and send another message with the HighLevelProducer and send() will trigger the callback with error: TimeoutError: Request timed out after 30000ms

What I am expecting is if a partition is not accessible (as a broker is down) producer should automatically send data to next available partition but I am losing message because of exception

The exception is as follows -

  TimeoutError: Request timed out after 3000ms
    at new TimeoutError (\package\node_modules\kafka-node\lib\errors\TimeoutError.js:6:9)
    at Timeout.timeoutId._createTimeout [as _onTimeout] (\package\node_modules\kafka-node\lib\kafkaClient.js:980:14)
    at ontimeout (timers.js:424:11)
    at tryOnTimeout (timers.js:288:5)
    at listOnTimeout (timers.js:251:5)
    at Timer.processTimers (timers.js:211:10)
(node:56416) [DEP0079] DeprecationWarning: Custom inspection function on Objects via .inspect() is deprecated
  kafka-node:KafkaClient kafka-node-client reconnecting to kafka1:9092 +3s
  kafka-node:KafkaClient createBroker kafka1 9092 +1ms
  kafka-node:KafkaClient kafka-node-client reconnecting to kafka1:9092 +3s
  kafka-node:KafkaClient createBroker kafka1 9092 +0ms
1

There are 1 best solutions below

5
On

Please send the bootstrap servers to confirm, but what I do believe you are experiencing based on information at hand is as follows:

  • You have min.insync.replicas set to 2
  • You have acks set to 1

With these settings, the producer will send the event to the leader replica and assume the message is safe.

If it fails immediately after the send, and before the followers have caught up, you will lose the message as you are waiting for only one ack.

From a broker perspective, you are however specifying that the requirement for the topic to be available is 2 in-sync replicas. By default only in sync replicas are allowed to be elected as leaders. As the failure of the first one will cause the followers to be out of sync, your topic might be forced offline. You can verify this in your tests, it's assuming some settings.

To rectify, try the following:

  1. If high availability is most important, set min.insync.replicas to 1 and acks to 1
  2. If data loss is not acceptable, set min.insync.replicas to 2 and acks to all

You can also set unclean.leader.election.enable to true for high availability, as this will allow an out of sync replica to be elected leader, but then there is the chance of data loss.