The fundametal method to convert a hex to base64 in python3

1k Views Asked by At

I want to convert a given hex into base64 (in python without using any libraries). As I learned from other stackoverflow answers, we can either group 3 hex (12 bits i.e. 4 bits each) to get 2 base64 values (12 bits i.e. 6 bits each). And also we can group 6 hex(24 bits) into 4 base64 values (24 bits).

The standard procedure is to append all the binary bits of hexs together and start grouping from left in packets of 6.

My question is regarding the situation we need padding for: (Assuming we are converting 3 hex into 2 base64) There will arise a situation when we are left with only 2 or 1 hex values to convert. Take the example below:

'a1' to base64

10100001 (binary of a1)

101000 01(0000) //making groups of 6 and adding additional 0's where required

This gives "oQ"the answer which is at some place(oQ==) and something different in other place(wqE=)

Q1. Which of the two sources are giving the correct answer? Why the other one is wrong being a good online decoder?

Q2. How do we realise the number of '=' here? (We could have just add sufficient 0's wherever needed as in example above, and thus ending the answer to be just oQ here and not oQ== , assuming oQ== is correct)

My concept is that: if the hex is of length 2 (rather than 3) we pad with a single = (hence complying with the answer wqE= in above case) , else if the hex is of length 1 ( rather than 3), we pad with double ='s.

At the same time, I am confused that, if 3 hex is converted into 2 base64, we would never need two ='s.

'a' to base64

1010 (binary of a)

Q3. How to convert hex 'a' to base64.

1

There are 1 best solutions below

0
On

Base64 is defined by RFC 4648 as being "designed to represent arbitrary sequences of octets". Octet is a unit of 8 bits, in practice synonymous with byte. When your input is in the form of a hex string, your first step should be to decode it into a byte string. You need two hex characters for each byte. If the length of the input is odd, the reasonable course of action is to raise an error.

To address you numbered questions:

Q1: Even while going to implement you own encoder, you can make use of Python standard library to investigate. Decoding the two results back to bytes gives:

>>> import base64
>>> base64.b64decode(b'oQ==')
b'\xa1'
>>> base64.b64decode(b'wqE=')
b'\xc2\xa1'

So, oQ== is correct, while wqE= has a c2 byte added in front. I can guess that it is the result of applying UTF-8 encoding before Base64. To confirm:

>>> '\u00a1'.encode('utf-8')
b'\xc2\xa1'

Q2: The rules for padding are detailed in the RFC.

Q3: This is ambiguous and you are right to be confused.