I'm trying to read out git objects from a git pack file, following the format for pack files laid out here. Once I hit the compressed data I'm running into issues. I'm trying to use System.IO.Compression.DeflateStream to decompress the zlib compressed objects. I basically ignore the zlib headers by skipping over the first 2 bytes. These 2 bytes for the first object anyway are 789C. Now the trouble starts.
1) I only know the size of the decompressed objects. The Read method documentation on the DeflateStream states that it "Reads a number of decompressed bytes into the specified byte array." Which is what I want, however I do see people setting this count to the size of the compressed data, one of us is doing it wrong.
2) The data I'm getting back is correct, I think (human-readable data that looks right), however it's advancing the underlying stream I give it all the way to the end! For example I ask it for 187 decompressed bytes and reads the remaining 212 bytes all the way to the end of the stream. As in the whole stream is 228 bytes and the position of the stream at the end of the deflate read 187 bytes is now 228. I can't seek backwards, as I don't know where the end of the compressed data is, and also not all the streams I use would be seekable. Is this the expected behavior to consume the whole stream?
According to the page you reference (I'm not familiar with this file format myself), each block of data is indexed by an offset field in the index for the file. Since you know the length of the type and data length fields that precedes each data block, and you know the offset of the next block, you also know the length of each data block (i.e. the length of the compressed bytes).
That is, the length of each data block is simply the offset of the next block minus the offset of the current block, then minus the length of the type and data length fields (however many bytes that is…according to the documentation, it's variable, but you can certainly compute that length as you read it).
So:
The documentation is correct.
DeflateStreamis a subclass ofStream, and has to follow that class's rules. Since theRead()method ofStreamoutputs the number of bytes requested, these must be uncompressed bytes.Note that per the above, you do know the size of the compressed objects. It's not stored in the file, but you can derive that information from the things that are stored in the file.
Yes, I would expect that to happen. Or at a minimum, I would expect some buffering to happen, so even if it didn't read all the way to the end of the stream, I would expect it to read at least some number of bytes past the end of the compressed data.
It seems to me that you have at least a couple of options:
MemoryStreamobject, and decompress the data from that stream rather than the original.