I'm trying to read out git objects from a git pack file, following the format for pack files laid out here. Once I hit the compressed data I'm running into issues. I'm trying to use System.IO.Compression.DeflateStream to decompress the zlib compressed objects. I basically ignore the zlib headers by skipping over the first 2 bytes. These 2 bytes for the first object anyway are 789C. Now the trouble starts.
1) I only know the size of the decompressed objects. The Read method documentation on the DeflateStream states that it "Reads a number of decompressed bytes into the specified byte array." Which is what I want, however I do see people setting this count to the size of the compressed data, one of us is doing it wrong.
2) The data I'm getting back is correct, I think (human-readable data that looks right), however it's advancing the underlying stream I give it all the way to the end! For example I ask it for 187 decompressed bytes and reads the remaining 212 bytes all the way to the end of the stream. As in the whole stream is 228 bytes and the position of the stream at the end of the deflate read 187 bytes is now 228. I can't seek backwards, as I don't know where the end of the compressed data is, and also not all the streams I use would be seekable. Is this the expected behavior to consume the whole stream?
I was doing exactly the same thing as OP (reading git pack files), and managed to hack up a way around this problem.
As per Mark Adler's comment here,
DeflateStream
is indeed brain-dead and useless, because yes, it does read bytes beyond the compressed data. Looking through the source code here, it reads the input data in 8K blocks :-/However,
DeflateStream
instances have a private member_inflater
, which have a private member_zlibStream
, which have a propertyAvailIn
, which returns the number of bytes available in the input buffer. IOW, this is the number of bytes too many that have been read, so by using reflection to get at these private parts, we can move the file pointer backwards by that many bytes, to return it to where it should've been left i.e. just past the end of the compressed data.This code is F#, but it should be clear what's going on:
I think the 4-byte adjustment is because zlib has a checksum in there...