How to model "dimension" tables in TiDB?

168 Views Asked by At

I would like to designate certain tables as replicated to all TiKV stores such that they are always available to join with locally (thereby reducing expensive distributed joins at the TiDB level). This would allow the TiKV coprocessor to join locally to this table because it's always available (ie: replicated to every TiKV). In the OLAP terminology of "dimensions" and "facts", this is a dimension table. In this scenario, I'd like to shard facts and replicate dimensions. It appears that TiDB treats everything as a sharded fact. Can this be done? If not, can it be approximated with some other technique? How amenable is the code base to allowing this type of feature?

1

There are 1 best solutions below

3
On

At present, TiDB splits each table into regions, and do the replication in the region level. It's hard to replicate a table into each TiKV server, even if it only contains one region. For example there are 100 nodes in the TiKV cluster but the configured number of region replica is 5.

We don't need to do the join operation in the TiKV coprocessor. We can read each dimension table from TiKV to multiply TiDB nodes and associate each involved TiDB node a portion of the fact table according to the data distribution of the fact table. Thus the join operation is done in the TiDB layer.

The technique described in the above is not implemented yet. But it's already on our roadmap.