How can leader get elected without entries stored in majority servers?

44 Views Asked by At

I was reading Kuujo's answer to one of the questions on leader election here: RAFT election restrictions

That's right. That the leader has all prior committed entries is a property of the election algorithm, but the actual commitIndex is in no way involved in the election algorithm. In fact, it is entirely possible for a leader to get elected without some entries that have been stored on a majority of servers. This is why commitment is not always determined by counting replicas and it's why leaders must commit an entry from their current term to call entries from past terms committed. See figure 8 in the Raft paper for an explanation of this. – kuujo May 17, 2016 at 21:35

Question: How can a leader get elected, if it doesnt have entries that have been stored on a majority of servers? Doesnt it mean that this candidate is not up-to-date, so shouldnt win an election?

1

There are 1 best solutions below

3
On

A log on a node contains committed and not committed entries. Non committed entries may be on a majority on nodes - but that does not makes them committed.

A candidate asks for votes, followers decided if the log is in a good state be using this rule (from the raft paper):

Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs. If the logs have last entries with different terms, then the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more up-to-date.

In addition to:

If an existing entry conflicts with a new one (same index but different terms), delete the existing entry and all that follow it

And:

Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas; once an entry from the current term has been committed in this way, then all prior entries are committed indirectly

This situation happens when a leader sends messages to a majority of nodes and fails before collecting confirmations.

Edit: forgot to mention, here is how a leader decides on commit index

If there exists an N such that N > commitIndex, a majority of matchIndex[i] ≥ N, and log[N].term == currentTerm: set commitIndex = N

Key item here is the check if the term is the currentTerm, which is owned by this leader. So if the leader crashes, a new one won't be able to declare an entry committed till that new leader appends an entry for its new term.

Edit: from two comments

a) Stored is not committed. A committed entry is the one, which a node may apply to its state machine. This may happen only when a leader gets a record sent to majority for a term owned by that leader. When this happens, all other stored-but-not-committed entries are considered committed as well and those are applied to state machines as well. You touched the importance of stored vs committed in the other question about Figure 8 from Raft paper: a record may be stored on majority of nodes due to a combination of re-elections, and that record is not considered committed and may be overridden.

b) Commenting inline:

I understand stored means that log entry has been copied in the log, but committed means the leader has to know that log entry has been replicated to quorum [yes, plus if it knows an entry is committed, all previous entries are now committed as well]. For a previous leader, log entry got committed as soon as replication quorum was achieved [as soon as a quorum is achieved for a new entry with a term owned by the new leader]. This will be known to follower or next leader in next AppendEntries call [on an appendEntries call with a new entry with leader's term, that will move committed index on the leader, and then on followers]. Previous leader could crash in between. On previous leaders crash, in next term a candidate can be leader only if it had those log entries stored [leader may replicate entries to a minority and crash, nodes in that minority won't vote for a node without those entries initially. Another leader may emerge as there are enough nodes without those entries. The new leader would have greater term and may accept a command from a customer, which would put an entry in leaders log wit leaders term. Now if reelection happens, the same leader would be voted by all nodes as this new leaders log has an entry with greatest term, which makes that log more recent], which it may not know are committed from its perspective. Please tell me what is the correct understanding.