What is the relation between a nodeId and a key in distributed hash tables?

127 Views Asked by At

My understanding of a distributed hash table is that every node can be identified uniquely by a nodeId and can store information, like host, port and values. Every node stores other nodeIds in (a) lookup table(s) and finding another node can be made as efficient as log(n) with system size n. In order to retrieve the value from a node, one would need a key. Is the key for a value just the nodeId (i.e. a content identifier or hash of the value)? If so, then every node can only save one value? Or can a nodeId store multiple key-values, in which case the question arises how to retrieve a value without knowing which node contains which keys.

1

There are 1 best solutions below

0
On

There are two general approaches for building distributed hash tables: distribute keys to nodes based on key values or based on hashes of keys. There are pros and cons for either option.

In either way, we will need to find a nodeId for a given key. A naive implementation could use modulus operation to find a target node. The problem with this approach is that when we do change number of nodes, then we have to remap absolutely every key - this is slow.

Let's say we have 4 nodes and the key is 10. 10 mod 4 is 2 an we will pick node #2 to store the data.

If you do need to handle node addition/removal efficiently, the common approach is called Consistent Hashing - that works similar to what you have described - we have multiple key (or hashes of key) intervals and assign many of those intervals to nodes - we require number of this intervals to be much larger than number of nodes. Hence if we add or remove a node, only some of those intervals need to be remapped to other nodes.