Are there multiple Base62 Encoding Algorithm?

2.4k Views Asked by At

I was watching a tutorial regarding system design for tiny url, and reading up on base62 encoding to avoid collision. They say to use a counter, and encode it with base62. Now this makes sense but looking at some online base62encoder, if the tiny url limit character say only 7 characters, if some of the encoder generate more than 7 characters. Are there multiple type of base62 encoding? e.g this two websites, gives 2 different result for same input of 1000000 enter image description here

enter image description here

2

There are 2 best solutions below

1
Vlad Feinstein On

Base62 and Base64 encodings are used to represent binary data as text.

I am not sure what practical use base62 has. base64, on another hand, can represent 6 bits as one character, Your sample value 1,000,000 (hex 0xF4240) uses 20 bits, so it fits into 4 base64 characters.

Your first example uses a plain text 1000000, which is 7 characters, 8-bit each. Or total of 56 characters, that would require 10 base64 characters.

You will get similar numbers for base62, but the encoding must be non-trivial, as you can't simply chop your data into 6-bits pieces.

Wiki link above mentions multiple variants, so you do have to agree between encoder and decoder - which one to use. But this is NOT the issue you saw in your two examples.

0
Basil Musa On

Yes, there are multiple algorithms for base62. You need to use the same algorithm implementation to decode what you already encoded, or else it won't decode properly.

The two algorithms for Base62 used are:

(1) Bigint Based Algorithm which is a bit slow with O(n^2) time complexity. e.g. https://github.com/jxskiss/base62/issues/2

(2) Variadic Length Encoding which is much faster O(n). Example implemenations in Go (https://github.com/jxskiss/base62) and in Java (https://github.com/glowfall/base62)

If you use one of the above, you have to keep using it to handle decoding successfully. Or else incorrect results occur.