HBase row key design: hotspotting vs. locality of reference

90 Views Asked by At

Consider a hypothetical HBase table.

  • The key must encode a 3-tuple (k, m, n) of integers between 0 and 1000.
  • The typical read is a range query over m and n, fixing a value of k.
  • The read load is exponentially distributed with respect to k. In other words, a few values of k are responsible for most of the read load.

Alice argues that the key should look like "k-m-n" in order to exploit locality of reference. Ideally, a single machine should be able to serve an entire query.

Bob argues that the key should look like "sha1(k-m)-n" in order to avoid hotspotting: if k=1 is accessed extremely often, then it would be wise for all the k=1 records to not all be on the same few machines.

Both arguments make sense to me. How do I figure out which option is more scalable/future-proof? Is there a quick, practical way to test this empirically?

0

There are 0 best solutions below