Will an incrementing integer PK produce uniform workload in DynamoDB

120 Views Asked by At

I am looking to index some data in DynamoDB and would like to key on an incrementing integer ID. The higher IDs will get the most of the traffic, however this will be spread evenly across tens of thousands of the highest IDs. Will this create uniform data access which is important for DynamoDB?

AWS don't seem to publish details on the hashing algorithm they use to generate primary keys. I am assuming it is something akin to md5 where, for example, the hash for 3000 is completely different from 3001, 3002 and 3003 and therefore it will result it a uniformly distributed workload.

The reason I ask, is that I know this is not the case in S3 where they suggest reversing auto incrementing IDs in cases like this.

2

There are 2 best solutions below

0
On BEST ANSWER

AWS have confirmed that using an incrementing integer ID will create an even workload:

If you are using incrementing numbers as the hash key, they will be distributed equally among the hash key space.

Source: https://forums.aws.amazon.com/thread.jspa?threadID=189362&tstart=0

1
On

DynamoDB doesn't seem to expose the internal workings of the hashing in documentation. A lot of places seem to quote MD5, but I am not sure if they can be considered authoritative.

An interesting study of distribution of hashes for number sequences is available here. The interesting data sets are Dataset 4 and Dataset 5 which deal with sequence of numbers. Most hashing functions (and MD5 more so) seem to be distributed satisfactorily from the view point of partitioning.