My team and I invented a method to solve the split-brain problem of the two-node system in engineering and published a paper. We are not a well-known team, but we think this method is very new and practical, so we want to discuss it with everyone to see if everyone thinks this is a major innovation.
The problem we try to solve
Let me first describe the problem we want to solve. In a two-nodes distributed system, if the link between the nodes fails and there is no third node, then the system composed of these two nodes cannot make leader election with both availability (liveness) and consistency (safety). This makes it impossible to design a two-nodes distributed storage or database system.
In order to solve this problem, engineers have thought of many ways. Some use more reliable hardware between the two nodes to avoid link failure, and some use a third node or shared medium for arbitration. But this puts additional requirements on the hardware.
The solution we proposed
In this paper, we propose a new method that neither relies on an additional third node or shared medium, nor a reliable link. This method is called a "level-based leader election algorithm". But this name is not used in the paper.
Suppose this is a distributed storage and database system that is partially synchronized (or eventually synchronized, or semi-synchronized), which is composed of S server nodes. There are C client nodes accessing them.
- When S>=3, the S server nodes can use any of uniform consensus algorithms, such as Paxos, Raft, to elect the leader. In this case, availability (liveness, the leader will be eventually elected) and consistency (safety, no different leaders in any time) can be guaranteed.
- When S<=2 and C>=1, the client node will also participate the leader election process. As long as the number of client nodes C>=1, the number of total nodes is not less than 3 and the uniform consensus algorithm can be used to elect the leader in a partial synchronization system. However, only the server node has the right to vote and be elected, while the client node only has the right to vote, not the right to be elected.
- When S<=2 and C=0, although the total number of nodes participating the leader election process is less than 3 and thus there is no algorithm can guarantee both availability and consistency, but there is also no request from client node at all! We can choose either one of the two server nodes to be the leader, since availability or consistency is not needed!
Request
I hope you can help us see if this method can be regarded as a major innovation. In addition, we plan to add the level-based leader election primitive into current storage and database request/response protocol. If any one has interest, please let me know.
I would rather comment on this question but I don't have the reputation so I can only make an answer.
I think this could be a cool innovation and could be useful in some applications, particularly those where there are less than five servers and clients are long-lived and do not change often. However, I think consideration of state machine reconfiguration algorithm literature might be useful. In particular, if you have a system of three nodes, you can safety reconfigure to two nodes only, using the two available nodes. But if you have a system of more than five nodes and all but two of them become unavailable at once, it wouldn't be possible to safely reconfigure to only two nodes as you might have missed what the other majority of them had agreed upon.
A similar situation occurs when too many of the clients fail also. First, imagine all but two servers fail and three or more clients are used to agree on something. Then all the clients fail/disconnect and all the servers come back online. There is no way here to agree on something new using only the two servers, and there is no way to reconfigure the consensus group to include the servers that have restarted or remove the clients that have failed. In this situation, a deadlock would have been reached (at least until those clients come back).
Best wishes, Michael :)