YugabyteDB internals on range queries over hash-partitioned tables

22 Views Asked by At

Is there any good reading on how YB implements range queries over hash-partitioned data? I'm curious because this is a hard problem to solve (yes, even with a good solution, it's generally not advisable to do this query...)

1

There are 1 best solutions below

0
dh YB On

When you are using hash sharding, a range query on the shard key will result in a scatter/gather type query as there is no way to know what tablets to route the queries to. As a general rule, if you are planning on doing a lot or range queries on the data, consider range based sharding instead. Designing a proper range based shard key does require a lot of considerations in order to avoid data imbalances and hot spots. There is a blog article here on how we chose our sharding strategies: https://www.yugabyte.com/blog/four-data-sharding-strategies-we-analyzed-in-building-a-distributed-sql-database/#google-spanner-and-hbase-range-sharding

One solution could be to union all all possible hash codes (from 0 to 65535) because where yb_hash_code(id)=0 is pushed down to get to the right tablet and the range of hash. But that would do 65536 calls which will not be faster.

When you do hash sharding, the hash is at the start of the primary key (the key in rocksdb) https://docs.yugabyte.com/preview/architecture/docdb-sharding/sharding/#hash-sharding

enter image description here