Partial uuids a good idea?

1.1k Views Asked by At

I need to generate and store a identifier per row in a distributed database (high write throughput). There are constraints on length of the Id, preferring it to be as small as possible. Id must be in a utf8.

I was considering generating a uuidv4, converting to base16 encoding, removing the hyphens and taking a partial subset of characters, and in the future if we need more characters we take a larger partial subset.

e.g. Uuid = 123e4567-e89b-12d3-a456-426655440000

Subset = 123e4567e89b

Are there foreseeable issues with this?

2

There are 2 best solutions below

1
brianolive On BEST ANSWER

You cannot guarantee that partial UUID’s will be universally unique. Now, depending on the number of UUIDs generated, this might not be an issue - especially if you check for duplicates...but perhaps its better just to write your own ID generator with the length specification that you need. I suppose the actual specification for UUIDs requires a certain number of bits for each to be deemed universally unique, but your requirements limit length. They do not require the use of actual UUIDs.

0
StephenS On

If your field must be text and length matters, then using base16 only gives you 4 bits per byte whereas base64 gives 6 bits per byte. In other words, the former needs 50% more bytes to achieve the same collision probability as the latter. You could get to ~7 bits per byte by taking advantage of how UTF-8 works, but that's a lot more work (and risk) for a lot less gain.

There is no point in using a truncated UUID, though; you have to use the whole thing or its anti-collision properties don't hold. If you just want a random string, especially when you have the ability to check for collisions, just generate a random number with the desired number of bits (preferably a multiple of 6) and then base64 encode it.