Get character code of specific encoding from string

1k Views Asked by sollniss At 27 June 2025 at 03:44

I'm trying to get the shift-jis character code from a unicode string. I'm not really that knowledgable in python, but here is what I have tried so far:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from struct import *

data="臍"
udata=data.decode("utf-8")
data=udata.encode("shift-jis").decode("shift-jis")
code=unpack(data, "Q")
print code

But I get an UnicodeEncodeError: 'ascii' codec can't encode character u'\u81cd' in position 0: ordinal not in range(128) error. The string is always a single character.

Original Q&A

There are 2 best solutions below

tdelaney On 12 February 2016 at 05:02

In python 2, when you create a utf-8 encoded string, you can leave encoded (data = "臍") or you can have python decode it into a unicode string for you when the program is parsed (`data = u"臍"). The second option is the normal way to create strings when your source file is utf-8 encoded.

When you tried to convert to JIS, you ended up decoding the JIS back into a python unicode string. And when you tried to unpack, you asked for "Q" (unisgned long long) when you really want "H" (unsigned short).

Following are two samples to get information on the character

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from struct import *

# here we have an "ascii" string that is really utf-8 encoded char
data="臍"
jis_data = data.decode('utf-8').encode("shift-jis")
code = unpack(">H", jis_data)[0]
print repr(data), repr(jis_data), hex(code)[2:]

# here python decodes the utf-8 encoded char for us
data=u"臍"
jis_data = data.encode("shift-jis")
code = unpack(">H", jis_data)[0]
print repr(data), repr(jis_data), hex(code)[2:]

Which results in

'\xe8\x87\x8d' '\xe4`' 58464 0xe460
u'\u81cd' '\xe4`' 58464 0xe460

mhawke On 12 February 2016 at 04:00

That character is represented in shift-jis as the two byte sequence 0xE4 and 0x60:

>>> data = u'\u81cd'
>>> data_shift_jis = data.encode('shift-jis')
'\xe4`'
>>> hex(ord('`'))
0x60

So '\xe4\x60' is u'\u81cd' encoded as shift-jis.

Get character code of specific encoding from string

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-2.7

Related Questions in UNICODE

Related Questions in ENCODING

Related Questions in SHIFT-JIS

Trending Questions

Popular # Hahtags

Popular Questions