Suppose I want to have highly available Kafka on production on small deployment. I have to use following configs
min.insync.replicas=2 // Don't want to lose messages in case of 1 broker crash
default.replication.factor=3 // Will let producer write in case of 1 replica disappear with broker crash
Will Kafka start making new replica in case of 1 broker crash and 1 replica gone with it?
Do we have to have at least default.replication.factor number of brokers under any conditions to keep working?
Well, you can have replication.factor same as
min.insync.replicas
. But there may be some challenges.As we know that during a broker outage, all partition replicas present on that broker become unavailable. That time availability of affected partitions is determined by the existence and status of their other replicas.
If a partition has no additional replica, the partition becomes totally unavailable. But if a partition has additional replicas that are in-sync, one of these in-sync replicas will become the interim partition leader. If the partition has addition replicas but none are in-sync, we have a choice to make: either we choose to wait for the partition leader to come back online–sacrificing availability — or allow an out-of-sync replica to become the interim partition leader–sacrificing consistency.
So in that case, it becomes for any partition to have an extra in-sync replica available to survive the loss of the partition leader. That implies, that min.insync.replicas should be set to atleast 2.
In order to have a minimum ISR size of 2, replication-factor must be at least 2 as well. However if there are only 2 replicas and one broker is unavailable, ISR size will decrease to 1 below minimum. Hence, it is better to have replication-factor greater than the minimum ISR size (at least 3).