Given a character, how can we transform its UTF-8 encoding to bits in Python?
As an example, a
corresponds to 01100001
. I am aware of ord
, but something like bin(ord('a'))[2:]
returns 1100001
, and it does not include 0
to the left. Of course, by zfill(8)
I can make it 8 bits, but I would like to know if there is a more pythonic way of doing this. For instance, if we do not know in-advance how many bits it requires, then zfill(8)
approach may not work any longer, as it may be 16 bits long.
Python 3 strings contain Unicode code points, not "UTF-8 characters". You can use
ord()
to get the Unicode code point value, and.encode()
to convert it to UTF-8 bytes. Then format each byte as 8-digit binary text, and.join()
them together. Example:Output: