How can I navigate a JSON file that has an array in it to select the values using Objectpath in python?

458 Views Asked by At

I have multiple JSON files to cover and can't seem to access the specific text(distracter) below. This is an example of a line in the file :

{"extracted":"high","nameid":3201932,"users":{"name":[{"ids":[28,37],"text":"distracter"}],"symbols":[]}}

Below is the code that I wrote that returns an empty result:

data = []
with open(fileName, 'r') as file_to_read:
    for line in file_to_read:
        data.append(json.loads(line))
        json_tree = objectpath.Tree(data)
        text_result= tuple(json_tree.execute('$.users.name[@.text]'))
return text_result
1

There are 1 best solutions below

0
On BEST ANSWER

I think there are two main problems here:

  1. The selector query seems wrong - I've tried this with '$.users.name.text' and found that worked for me (using Python3 and objectpath)
  2. The function isn't building up the list of names correctly

Try something like this instead:

import json
import objectpath


def get_names_tree(data):
    tree = objectpath.Tree(data)
    return tuple(tree.execute('$.users.name.text'))


def load_data(file_name):
    names = []

    with open(file_name) as fh:
        for line in fh:
            data = json.loads(line)
            names.extend(get_names_tree(data))

    return names

In the loop above we build up a list of names, rather than decoded entities. In your version the text_result variable is repeatedly instantiated and only the last one is returned.

You might also be able to increase the speed by using a pure python approach to getting the data.

def get_names_careful(data):
    return tuple(
        name['text'] for name in
        data.get('users', {}).get('name', [])
        if 'text' in name
    )


def get_names(data):
    return tuple(name['text'] for name in data['users']['name'])

The first is careful about not raising errors with missing data, but if you know your data is always the right shape, you could try the second.

In my testing they are 15x faster (for the careful version) and 20x faster for the careless version.