I have a UTF-16-BE encoded string:
utf16be = '\x0623\x0631\x0646\x0628'
print repr(utf16be)
> '\x0623\x0631\x0646\x0628'
I need to know if it's a 1-byte or 2-byte encoding, i have tried with the below snippet:
for c in utf16be:
c_ord = ord(c)
if c_ord >= 256:
print 'Its a 2-byte (or more) encoded string'
break
But that wont work because i thought utf16be[0] will be equal to '\x0623', but it's actually equal to '\x06':
for c in utf16be:
print repr(c)
> '\x06'
> '2'
> '3'
> '\x06'
> '3'
> '1'
> '\x06'
> '4'
> '6'
> '\x06'
> '2'
> '8'
So what is the best practice to check if i have a 2-byte encoded string ?
Use chardet package to guess encoding