I've got 30,000 folders and each folder contains 5 bz2 files of json data.
I'm trying to use os.walk() to loop through the file path and decompress each compressed file and save in the original directory.
import os
import bz2
path = "/Users/mac/PycharmProjects/OSwalk/Data"
for(dirpath,dirnames,files) in os.walk(path):
for filename in files:
filepath = os.path.join(dirpath , filename)
newfilepath = os.path.join(dirpath , filename + '.decompressed')
with open(newfilepath , 'wb') as new_file ,
bz2.BZ2File(filepath , 'rb') as file:
for data in iter(lambda: file.read(100 * 1024) , b''):
new_file.write(data)
I'm getting the following error running the code.
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compr
ession.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
I've read that there can be an issue running the code on mac with decompressor method or am I missing something else?
It looks like you might be trying to decompress your already decompressed results. You should filter them out.