Read a deflate stream until adler32 checksum

1k Views Asked by At

So .net does not have a ZlibStream so I am trying to implement my own one using DeflateStream which .net does have. DeflateStream also apparently does not support using Dictionaries so I skip that in my ZlibStream as well.

Writing works well but I have a problem with my Read method.
Here is my Read method:

public override int Read(byte[] buffer, int offset, int count)
{
    EnsureDecompressionMode();
    if (!_readHeader)
    {
        // read the header (CMF|FLG|optional DIC)
        _readHeader = true;
    }

    var res = _deflate.Read(buffer, offset, count);
    if (res == 0) // EOF
    {
        // read adler32 checksum
        BaseStream.ReadFully(_scratch, 0, 4);
        var checksum = (uint)_scratch[0] << 24 |
                       (uint)_scratch[1] << 16 |
                       (uint)_scratch[2] << 8 |
                       (uint)_scratch[3];
        if (checksum != _adler32.Checksum)
        {
            throw new ZlibException("Invalid checksum");
        }
    }
    else
    {
        _adler32.CalculateChecksum(buffer, offset, res);
    }
    return res;
}

Where:

  • _scratch is a byte[4] used as a temporary buffer
  • _deflate is a DeflateStream.

Zlib's format is CMF|FLG|optional DICT|compressed data|adler32|. So I need a way to stop reading when the adler32 is reached. Initially, I thought DeflateStream would return EOF when it's done but it turns out it reads till EOF of the underlying stream. So it also reads the adler32 as if it's compressed data. So when I try to read adler32 from BaseStream inside the if block, an EOF exception is thrown.

So how do I make DeflateStream stop reading the adler32 as if it's compressed data and instead EOF there or do something equivalent, so that I can read adler32 from the BaseStream without compression?

2

There are 2 best solutions below

8
On

Since files have a fixed size can't you simply stop at base.Length - typeof(int)? Adjust the read-buffer if necessary and then read the uncompressed checksum.

Someting like:

public override int Read(byte[] buffer, int offset, int count)
{
    // read header...

    int res = -1;
    if (base.Position + count - offset > base.Length)
    {
        // EOF, skip the last four bytes (adler32) and read them without decompressing
        res = _deflate.Read(buffer, offset, count - sizeof(int));
    }
    else
    {
        res = _deflate.Read(buffer, offset, count);
    }

    // continue processing the data
}

not tested

0
On

Looking through the source code here, DeflateStream reads the input data in 8K blocks :-/, so if your input file is small, it will look like it's reading up to the end of the file,

However, DeflateStream instances have a private member _inflater, which have a private member _zlibStream, which have a property AvailIn, which returns the number of bytes available in the input buffer. IOW, this is the number of bytes too many that have been read, so by using reflection to get at these private parts, we can move the file pointer backwards by that many bytes, to return it to where it should've been left i.e. just past the end of the compressed data.

This code is F#, but it should be clear what's going on:

// zstream is the DeflateStream instance
let inflater = typeof<DeflateStream>.GetField( "_inflater", BindingFlags.NonPublic ||| BindingFlags.Instance ).GetValue( zstream )
let zlibStream = inflater.GetType().GetField( "_zlibStream", BindingFlags.NonPublic ||| BindingFlags.Instance ).GetValue( inflater )
let availInMethod = zlibStream.GetType().GetProperty( "AvailIn" ).GetMethod
let availIn: uint32 = unbox( availInMethod.Invoke( zlibStream, null ) )
// inp is the input file
inp.Seek( -(int64 availIn), SeekOrigin.Current ) |> ignore