I'm trying to get the shift-jis character code from a unicode string. I'm not really that knowledgable in python, but here is what I have tried so far:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from struct import *
data="臍"
udata=data.decode("utf-8")
data=udata.encode("shift-jis").decode("shift-jis")
code=unpack(data, "Q")
print code
But I get an UnicodeEncodeError: 'ascii' codec can't encode character u'\u81cd' in position 0: ordinal not in range(128)
error.
The string is always a single character.
In python 2, when you create a
utf-8
encoded string, you can leave encoded (data = "臍") or you can have python decode it into a unicode string for you when the program is parsed (`data = u"臍"). The second option is the normal way to create strings when your source file is utf-8 encoded.When you tried to convert to JIS, you ended up decoding the JIS back into a python unicode string. And when you tried to unpack, you asked for "Q" (unisgned long long) when you really want "H" (unsigned short).
Following are two samples to get information on the character
Which results in