For TiDB, does incremental value in secondary index cause hotspots too?

52 Views Asked by At

I want to ask about TiDB index design. It is noted in the docs that using auto-increment will cause a hotspot in the write since range-based sharding is used. Is this the same case for secondary indexes? Because secondary indexes are also just another key-value pair in TiKV. If so, for example, would an index on updated_at field (which is monotonically increasing) cause hotspots?

1

There are 1 best solutions below

0
On

Yes, in TiDB, the incremental value in a secondary index can also cause hotspots. This is because secondary indexes in TiDB are also stored as key-value pairs in TiKV, and they follow the same range-based sharding mechanism as the primary index [1].

When a large amount of data is written to a table with a secondary index, and the values in the secondary index are continuously increasing (such as the updated_at field you mentioned), it can create hotspots in a few regions. These hotspots occur because the data with consecutive index values is written to a few specific regions, which becomes a bottleneck for the entire system [1].

To avoid hotspots caused by secondary indexes, you can consider the following strategies:

  1. Randomize the values: If possible, you can introduce randomness to the values in the secondary index. For example, you can add a random prefix or suffix to the updated_at field value before inserting it into the secondary index. This can help distribute the data more evenly across different regions.

  2. Use a composite index: Instead of creating a secondary index on a single monotonically increasing field, you can create a composite index that includes multiple fields. By including other fields that have more diverse values, you can reduce the likelihood of hotspots.

  3. Adjust the shard size: You can adjust the shard size of the TiKV regions to distribute the data more evenly. However, this requires careful consideration and testing, as it can have an impact on the overall performance and resource utilization of the TiDB cluster.

It's important to note that while these strategies can help mitigate hotspots caused by secondary indexes, they may introduce trade-offs in terms of query performance and index size. It's recommended to carefully evaluate your specific use case and workload characteristics to determine the most suitable index design and optimization strategy.

I hope this answers your question. If you have any further inquiries, please feel free to ask.

[1]: TiDB Documentation - TiDB Best Practices