Suppose I want to have highly available Kafka on production on small deployment. I have to use following configs
min.insync.replicas=2 // Don't want to lose messages in case of 1 broker crash
default.replication.factor=3 // Will let producer write in case of 1 replica disappear with broker crash
Will Kafka start making new replica in case of 1 broker crash and 1 replica gone with it?
Do we have to have at least default.replication.factor number of brokers under any conditions to keep working?
Well, you can have replication.factor same as
min.insync.replicas. But there may be some challenges.As we know that during a broker outage, all partition replicas present on that broker become unavailable. That time availability of affected partitions is determined by the existence and status of their other replicas.
If a partition has no additional replica, the partition becomes totally unavailable. But if a partition has additional replicas that are in-sync, one of these in-sync replicas will become the interim partition leader. If the partition has addition replicas but none are in-sync, we have a choice to make: either we choose to wait for the partition leader to come back online–sacrificing availability — or allow an out-of-sync replica to become the interim partition leader–sacrificing consistency.
So in that case, it becomes for any partition to have an extra in-sync replica available to survive the loss of the partition leader. That implies, that min.insync.replicas should be set to atleast 2.
In order to have a minimum ISR size of 2, replication-factor must be at least 2 as well. However if there are only 2 replicas and one broker is unavailable, ISR size will decrease to 1 below minimum. Hence, it is better to have replication-factor greater than the minimum ISR size (at least 3).