I have a working version of decompressing bzip2 data where I call the bz2_bzdecompress API. It goes something like this
while (bytes_input < len) {
isDone = false;
// Initialize the input buffer and its length
size_t in_buffer_size = len -bytes_input;
the_bz2_stream.avail_in = in_buffer_size;
the_bz2_stream.next_in = (char*)data +bytes_input;
size_t out_buffer_size =
output_size -bytes_uncompressed; // size of output buffer
if (out_buffer_size == 0) { // out of space in the output buffer
break;
}
the_bz2_stream.avail_out = out_buffer_size;
the_bz2_stream.next_out =
(char*)output +bytes_uncompressed; // output buffer
ret = BZ2_bzDecompress(&the_bz2_stream);
if (ret != BZ_OK && ret != BZ_STREAM_END) {
throw Bzip2Exception("Bzip2 failed. ", ret);
}
bytes_input += in_buffer_size - the_bz2_stream.avail_in;
bytes_uncompressed += out_buffer_size - the_bz2_stream.avail_out;
*data_consumed =bytes_input;
if (ret == BZ_STREAM_END) {
ret = BZ2_bzDecompressEnd(&the_bz2_stream);
if (ret != BZ_OK) {
throw Bzip2Exception("Bzip2 fail. ", ret);
}
isDone = true;
}
}
This works great for native bzip2 compressed files, but for pbzip2 (Parallel Bzip2) and "Splittable" bzip2 data, it throws a "BZ_PARAM_ERROR".
I see that pbzip2 in their documentation says this-
Data compressed with pbzip2 is broken into multiple streams and each stream is bzip2 compressed looking like this: [-----|-----|-----|-----|-----|-----|-----|-----|-----]
If you are writing software with libbzip2 to decompress data created with pbzip2, you must take into account that the data contains multiple bzip2 streams so you will encounter end-of-stream markers from libbzip2 after each stream and must look-ahead to see if there are any more streams to process before quitting. The bzip2 program itself will automatically handle this condition.
Source:http://compression.ca/pbzip2/
Can someone please tell me how to handle this? Should I be using some other libzip2 API?
Also, pbzip2 files are compatible with the normal "bunzip2" command. How is that bzip2 handles this gracefully while my code throws a BZ_PARAM_ERROR?
Thanks.
After your
BZ2_bzDecompressEnd()
you need to callBZ2_bzDecompressInit()
again (you must have called it initially before that loop), if there is still data left to decompress, i.e.bytes_input < len
.To decompress each of the
|-----|
blocks, you need to do aninit
, some number ofdecompress
calls, and anend
. So if you still have input left, then you need to do anotherinit
, n*decompress
,end
.Make sure that you do a final
end
, in order to avoid a big memory leak.You're getting a
BZ_PARAM_ERROR
because you are trying to use an uninitializedbz_stream
to decompress. Once you doBZ2_bzDecompressEnd()
, you can't use thatbz_stream
any more, unless you do aBZ2_bzDecompressInit()
on it.