How to convert surrogate pairs into hexadecimal, and vice-versa in Python?

112 Views Asked by At

How would I convert characters which are surrogate pairs into hexadecimal?

I've found that using hex() and ord() works for characters with a single code point, such as emojis like "". For example:

print(hex(ord("")))
# '0x1f600'

Similarly, using chr() and int() works for getting the characters from the hexadecimal:

print(chr(int(0x1f600)))
# ''

However, as soon as I use a surrogate pair, such as an emoji like "", the code throws an error:

print(hex(ord("")))
TypeError: ord() expected a character, but string of length 2 found

How would I fix this, and how would I convert such hexadecimal back into a character?

1

There are 1 best solutions below

0
On BEST ANSWER

Since an exact output format wasn't specified, how about:

def hexify(s):
    return s.encode('utf-32-be').hex(sep=' ', bytes_per_sep=4)

def unhexify(s):
    return bytes.fromhex(s).decode('utf-32-be')

s = hexify('')
print(s)
print(unhexify(s))

Output:

0001f469 0001f3fb

Or similar to your original code:

def hexify(s):
    return [hex(ord(c)) for c in s]

def unhexify(L):
    return ''.join([chr(int(n,16)) for n in L])

s = hexify('')
print(s)
print(unhexify(s))

Output:

['0x1f469', '0x1f3fb']