I have some big json files with the following structure:
[
{
"url": "",
"publishedDate": "",
"modifiedDate": "",
"title": "",
"summary": "",
"content": "",
"language": "",
"section": "",
"tags": [],
"authors": []
},
{
"url": "",
"publishedDate": "",
"modifiedDate": "",
"title": "",
"summary": "",
"content": "",
"language": "",
"section": "",
"tags": [],
"authors": []
},
...
]
But serializing this big JSONs with the default python json
library ends up consuming too much memory so I've searched for other alternatives. One of such is ijson
which, is supposed to consume only the same amount in memory as the file size itself.
Problem is, I don't know how to use it (I'm new to python from a java perspective) and most tutorials I've found don't parse jsons like the one above. How can I make ijson yield dictionaries
for each item in the json's list?
Thanks in advance.