jsonpickle/json function input utf-8, output unicode?

4.7k Views Asked by At

Wrote the following two functions for storing and retrieving any Python (built-in or user-defined) object with a combination of json and jsonpickle (in 2.7)

def save(kind, obj):
    pickled = jsonpickle.encode(obj)
    filename = DATA_DESTINATION[kind] \\returns file destination to store json
    if os.path.isfile(filename):
        open(filename, 'w').close()
    with open(filename, 'w') as f:
        json.dump(pickled, f)

def retrieve(kind):
    filename = DATA_DESTINATION[kind] \\returns file destination to store json
    if os.path.isfile(filename):
        with open(filename, 'r') as f:
            pickled = json.load(f)
            unpickled = jsonpickle.decode(pickled)
            print unpickled

I haven't tested these two functions with user-defined objects, but when i attempt to save() a built-in dictionary of strings, (ie. {'Adam': 'Age 19', 'Bill', 'Age 32'}), and i retrieve the same file, i get the same dictionary back in unicode, {u'Adam': u'Age 19', u'Bill', u'Age 32'}. I thought json/jsonpickle encoded by default to utf-8, what's the deal here?

[UPDATE]: Removing all jsonpickle encoding/decoding does not effect output, still in unicode, seems like an issue with json? Perhaps I'm doing something wrong.

4

There are 4 best solutions below

6
On BEST ANSWER

You can encode the unicode sting after calling loads().

json.loads('"\\u79c1"').encode('utf-8')

Now you have a normal string again.

0
On

The problem is that json, as a serialization format, is not expressive enough to carry the information about the original type of strings. In other words, if you have a json string a you can't tell whether is has been originated from a python string "a" or from a python unicode string u"a".

Indeed, you can read in the documentation of the json module about the option ensure_ascii. Basically, depending on where you are going to write the generated json, you might tolerate a unicode string, or need an ascii string with all incoming unicode characters properly escaped.

For example:

>>> import json
>>> json.dumps({'a':'b'})
'{"a": "b"}'
>>> json.dumps({'a':u'b'}, ensure_ascii=False)
u'{"a": "b"}'
>>> json.dumps({'a':u'b'})
'{"a": "b"}'
>>> json.dumps({u'a':'b'})
'{"a": "b"}'
>>> json.dumps({'a':u'\xe0'})
'{"a": "\\u00e0"}'
>>> json.dumps({'a':u'\xe0'}, ensure_ascii=False)
u'{"a": "\xe0"}'

As you can see, depending on the value of ensure_ascii you end up with an ascii json string or a unicode one, but the components of the original objects are all flattened to the same common encoding. Look at {"a": "b"} case in particular.

jsonpickle simply makes use of json as its underlying serialization engine, adding no extra metadata to keep track of the original string types, therefore you are in fact loosing information along the way.

>>> jsonpickle.encode({'a': 'b'})
'{"a": "b"}'
>>> jsonpickle.encode({'a': u'b'})
'{"a": "b"}'
>>> jsonpickle.encode({u'a': 'b'})
'{"a": "b"}'
3
On

I thought json ... encoded by default to utf-8, what's the deal here?

No, it encodes to ASCII. And it decodes to unicode.

>>> json.dumps(u'私')
'"\\u79c1"'
>>> json.loads('"\\u79c1"')
u'\u79c1'
0
On
import jsonpickle
import json

jsonpickle.set_preferred_backend('json')
jsonpickle.set_encoder_options('json', ensure_ascii=False)
print( jsonpickle.encode( { "value" : "значение"}) )

{"value": "значение"}