HBase row key design: hotspotting vs. locality of reference

78 Views Asked by Carl Patenaude Poulin At 30 July 2025 at 06:48

Consider a hypothetical HBase table.

The key must encode a 3-tuple (k, m, n) of integers between 0 and 1000.
The typical read is a range query over m and n, fixing a value of k.
The read load is exponentially distributed with respect to k. In other words, a few values of k are responsible for most of the read load.

Alice argues that the key should look like "k-m-n" in order to exploit locality of reference. Ideally, a single machine should be able to serve an entire query.

Bob argues that the key should look like "sha1(k-m)-n" in order to avoid hotspotting: if k=1 is accessed extremely often, then it would be wise for all the k=1 records to not all be on the same few machines.

Both arguments make sense to me. How do I figure out which option is more scalable/future-proof? Is there a quick, practical way to test this empirically?

Original Q&A

HBase row key design: hotspotting vs. locality of reference

There are 0 best solutions below

Related Questions in KEY

Related Questions in HBASE

Related Questions in LOCALITYOFREFERENCE

Trending Questions

Popular # Hahtags

Popular Questions