ijson : How to use ijson to retrieve a dict/list element (from file or from string)?

901 Views Asked by At

I am trying to use ijson to retrieve an element from a json dict object.

The json string is inside a file and the only thing in that file is that content:

{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}

(that string is very simplified but in fact is over 2GB long)

I need to help to do the following:

1/ Open that file and

2/ Use ijson to load that json data in to some object

3/ Retrieve the list "[1,2,3]" from that object

Why not just using the following simple code:

my_json = json.loads('{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}')
my_list = my_json['categoryAspects']

Well, you have to imagine that this "[1,2,3]" list is in fact over 2GB long , so using json.loads() will not work(it would just crash).

I tried a lot of combination (A LOT) and they all failed Here are some examples of the things that I tried

ij = ijson.items(fd,'') -> this does not give any error, the one below do

my_list = ijson.items(fd,'').next()
-> error = '_yajl2.items' object has no attribute 'next'

my_list = ijson.items(fd,'').items()
-> error = '_yajl2.items' object has no attribute 'items'

my_list = ij['categoryAspects']
-> error = '_yajl2.items' object is not subscriptable

1

There are 1 best solutions below

5
On BEST ANSWER

This should work:

with open('your_file.json', 'b') as f:
    for n in ijson.items(f, 'categoryAspects.item'):
        print(n)

Additionally, and if you know your numbers are kind of "normal numbers", you can also pass use_float=True as an extra argument to items for extra speed (ijson.items(f, 'categoryAspects.item', use_float=True) in the code above) -- more details about it in the documentation.

EDIT: Answering a further question: to simply get a list with all the numbers you can create one directly from the items function like so:

with open('your_file.json', 'b') as f:
    numbers = list(ijson.items(f, 'categoryAspects.item'))

Mind you that if there are too many numbers you might still run out of memory, defeating the purpose of doing a streaming parsing.

EDIT2: An alternative to using a list is to create a numpy array with all the numbers, which should give a more compact representation in memory of all the numbers at once, in case they are needed:

with open('your_file.json', 'b') as f:
    numbers = numpy.fromiter(
                ijson.items(f, 'categoryAspects.item', use_float=True),
                dtype='float' # or int, if these are integers
              )