Is there any tools or operation to use to mitigate data loss issues when kafka broker fail in multi node kafka cluster.

1

There are 1 best solutions below

1
On

well, replication is an important features of Kafka and a key element to avoid data loss. In particular, should one of your broker go down, the replica on other brokers will be used by the consumers just as nothing happened (from the business side). Of course, this has consequences on the connections, band width etc.

However, a message must have been properly produced to be replicated.

So basically, if you have a replication set at higher than 1, this should be safe, as long as your producers don't go down.

The default.replication.factor is 1, so set replication (at the topic or general level) to 2 or 3. Of course you need 2 or 3 brokers.

http://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor