I am looking to do, in Python 3.8, the equivalent of:
xz --decompress --stdout < hugefile.xz > hugefile.out
where neither the input nor output might fit well in memory.
As I read the documentation at https://docs.python.org/3/library/lzma.html#lzma.LZMADecompressor I could use LZMADecompressor to process incrementally available input, and I could use its decompress() function to produce output incrementally.
However it seems that LZMADecompressor puts its entire decompressed output into a single memory buffer, and decompress() reads its entire compressed input from a single input memory buffer.
Granted, the documentation confuses me as to when the input and/or output can be incremental.
So I figure I will have to spawn a separate child process to execute the "xz" binary.
Is there anyway of using the lzma Python module for this task?
Instead of using the low-level
LZMADecompressor
, uselzma.open
to get a file object. Then, you can copy data into an other file object with theshutil
module:Internally,
shutils.copyfileobj
reads and write data in chunks, and the LZMA decompression is done on the fly. This avoids decompressing the whole data into memory.