I am trying to decode a .bz2 file in Python. The problem seems to come when I use the decompress method as it adds a header/prefix before the original data.
import bz2
with open("/Users/X/exampleFiles/secondP5D.bz2", "rb") as f:
decompressedFile = bz2.decompress(f.read())
df= decompressedFile.decode('ISO-8859-1')
print('Decompressed :' + df)
the output is: po_dùdbplist00 òî3¸¦!@öÎÚh@åU-<[3³{ëTÕÍuGò|À6C0Õ4ñqí¿W·GÝ>òSþUé¶ÓÙÅ. û®fP±b±Oã0SÞº%PaxHeader/secondP5D000644 000765 000024 00000000033 13705257402 016161 xustar00davidstaff000000 000000 27 mtime=1595236098.142357 secondP5D000644 000765 000024 00000000060 13705257402 014210 0ustar00davidstaff000000 000000 ES0113000058876511WG0F;2020/03/07 01:00;0;333;;
where data should only be ES0113000058876511WG0F;2020/03/07 01:00;0;333;;
If I don't use the decode function with ISO-8859-1 I get errors such as
File "/Users/X/Downloads/zeppelin-0.8.2-bin-all/interpreter/python/py4j-0.9.2/src/py4j/protocol.py", line 202, in smart_decode return unicode(s, "utf-8") UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 549: invalid start byte
how to open the file and use de decompressor so there's not a header? if I do the same in the mac command line with tar xvf secondP5D.bz2 the resulting file doesn't contain the header/prefix