I'm trying to open a bz2 file and read the json file contained inside. My current implementation looks like
with bz2.open(bz2_file_path, 'rb') as f:
json_content = f.read()
json_df = pd.read_json(json_content.decode('utf-8'), lines = True)
I need to repeat this process many times, and the the with block is taking up the bulk of the time. Is there a way which I can speed this process up?
The following variation of your code won't necessarily read all the code into memory at once. Passing
encodingto bz2.open() allows the decoding to be done on the fly, and panads.read_json() can accept a file-like object to read incrementally.