Recursively transform dict leaves in Python

214 Views Asked by At

I'm having trouble applying a function to all leaves of a dict (loaded from a JSON file) in Python. The text has been badly encoded and I want to use the ftfy module to fix it.

Here is my function:

def recursive_decode_dict(e):
    try:
        if type(e) is dict:
            print('Dict: %s' % e)
            return {k: recursive_decode_dict(v) for k, v in e.items()}

        elif type(e) is list:
            print('List: %s' % e)
            return list(map(recursive_decode_dict, e))

        elif type(e) is str:
            print('Str: %s' % e)
            print('Transformed str: %s' % e.encode('sloppy-windows-1252').decode('utf-8'))
            return e.encode('sloppy-windows-1252').decode('utf-8')

        else:
            return e

Which I call this way :

with open('test.json', 'r', encoding='utf-8') as f1:
    json_content = json.load(f1)
    recursive_decode_dict(json_content)


with open('out.json', 'w', encoding='utf-8') as f2:
    json.dump(json_content, f2, indent=2)

Console output is fine :

  > python fix_encoding.py 
List: [{'fields': {'field1': 'the European-style café into a '}}]
Dict: {'fields': {'field1': 'the European-style café into a '}}
Dict: {'field1': 'the European-style café into a '}
Str: the European-style café into a 
Transformed str: the European-style café into a 

But my output file is not fixed :

[
  {
    "fields": {
      "field1": "the European-style caf\u00c3\u00a9 into a "
    }
  }
]
1

There are 1 best solutions below

0
On

If it's JSON data you're massaging, you can instead hook into the JSON decoder and fix strings as you encounter them.

This does require using the slower Python-based JSON parser though, but that's likely not an issue for an one-off conversion...

import json
import ftfy


decoder = json.JSONDecoder()


def ftfy_parse_string(*args, **kwargs):
    string, length = json.decoder.scanstring(*args, **kwargs)
    string = string.encode("sloppy-windows-1252").decode("utf-8")
    return (string, length)


decoder.parse_string = ftfy_parse_string
decoder.scan_once = json.scanner.py_make_scanner(decoder)

print(decoder.decode(r"""[
  {
    "fields": {
      "field1": "the European-style café into a "
    }
  }
]"""))

outputs

[{'fields': {'field1': 'the European-style café into a '}}]