I'm having trouble applying a function to all leaves of a dict (loaded from a JSON file) in Python. The text has been badly encoded and I want to use the ftfy module to fix it.
Here is my function:
def recursive_decode_dict(e):
try:
if type(e) is dict:
print('Dict: %s' % e)
return {k: recursive_decode_dict(v) for k, v in e.items()}
elif type(e) is list:
print('List: %s' % e)
return list(map(recursive_decode_dict, e))
elif type(e) is str:
print('Str: %s' % e)
print('Transformed str: %s' % e.encode('sloppy-windows-1252').decode('utf-8'))
return e.encode('sloppy-windows-1252').decode('utf-8')
else:
return e
Which I call this way :
with open('test.json', 'r', encoding='utf-8') as f1:
json_content = json.load(f1)
recursive_decode_dict(json_content)
with open('out.json', 'w', encoding='utf-8') as f2:
json.dump(json_content, f2, indent=2)
Console output is fine :
> python fix_encoding.py
List: [{'fields': {'field1': 'the European-style café into a '}}]
Dict: {'fields': {'field1': 'the European-style café into a '}}
Dict: {'field1': 'the European-style café into a '}
Str: the European-style café into a
Transformed str: the European-style café into a
But my output file is not fixed :
[
{
"fields": {
"field1": "the European-style caf\u00c3\u00a9 into a "
}
}
]
If it's JSON data you're massaging, you can instead hook into the JSON decoder and fix strings as you encounter them.
This does require using the slower Python-based JSON parser though, but that's likely not an issue for an one-off conversion...
outputs