I have read and watched many youtube videos & links which all provide same solution which is:
- Use a distributed counter like zookeeper
- Counter max limit can be 3.5 trillion
- Convert the Counter value to Base62
which is all fine when the counter value is small. e.g. generated counter value: 120001 => base62 value FMJQmhBR
but when the counter provides large counter value like below the base62 value length also increases. generated counter value: 120003658=> base62 value HRGZF8RiHC6y
So how can this be a solution for exact tiny url with exact 8 length.
https://www.linqz.io/2018/10/how-to-build-a-tiny-url-service-that-scales-to-billions.html https://www.youtube.com/watch?v=eCLqmPBIEYs https://www.youtube.com/watch?v=JQDHz72OA3c&t=1862s
Given that you are not hashing every url, but a vaguely-predictable number, you could hash the result and take the first N bits
However, there are many solutions for what to do for collisions
Here's a great video on cuckoo hashing (which is a structure of hashes relevant here):
https://www.youtube.com/watch?v=HRzg0SzFLQQ
Here's an example in Python which finds an 8-character string from the hash which should be fairly unique (this could then be collected into a sorted data structure mapping it into a URL)
This works by first hashing the value with an avalanching hash (SHA-265) and then loops to find the minimum amount of it (slice from the front of the hex string) to form an 8-char base62 string
This could be made much more efficient (even, for example by bisecting), but may be clearer as-is and depends hugely on unspecified algorithm requirements
base62 logic from How to fix the code for base62 encoding with Python3?