python3 and numba, wrong calculation of adler32 checksum

64 Views Asked by At

I have written the same algorithm in 3 flavours:

  • plain Python
  • Python optimized with Numba
  • Micropython optimized with Viper

I am sure about the correctness of the routine in plain Python and Micropython: the checksum on some megabytes of random data is equal the checksum computed by zlib.adler32

This is the routine written for Numba:

@numba.jit(nopython = True)
def adler_32(data: bytes, length: np.int32, init: np.uint32) -> np.uint32 :
# note: on the first call, init should be 1
    ADLER = np.uint16(65521)
    NMAX = np.int32(5552)                               # max iterations to keep s1, s2 as uint16

    s1 = np.uint16(init & 0xffff)
    s2 = np.uint16(init >> 16)
    i = 0
    while length > 0 :
        k = min(length, NMAX)

        length -= k
        while k :
            s1 += np.uint8(data[i])
            k -= 1
            s2 += s1
            i += 1

        if s1 >= ADLER :
            s1 -= ADLER
        if s2 >= ADLER :
            s2 -= ADLER

    return  (np.uint32(s2) << 16) | s1

Problem: s1 and s2 assumes values greater then 0xffff, so the results are incorrect.

Why s1 and s2, declared as uint16, get bigger values? Are there any caveats or side effects I don't see?

ERRATA CORRIGE

This is obviously wrong, in a hurry I copy & paste it from a microcontroller routine where the data are small and s1, s2 stay in uint16.

        if s1 >= ADLER :
            s1 -= ADLER
        if s2 >= ADLER :
            s2 -= ADLER

The correct, general algorithm, is

    s1 %= ADLER
    s2 %= ADLER
2

There are 2 best solutions below

1
On BEST ANSWER

You need to understand the magic number in there: 5552. One more than that, 5553, is the number of iterations in the worst case that will overflow s2 if it is a 32-bit unsiged integer. Both s1 and s2 need to be declared as 32 bits. Not 16 bits.

Worst case is s1 and s2 starting at their highest values, 65520, and all of the bytes being 255. Then s2 becomes (1 + n) (65520 + 255 n / 2), where n is the number of bytes, all with value 255. Setting that equal to 232 and solving for n, you get 5552.19.

Another error is that if sx > ADLER: sx -= ADLER falls well short of assuring that sx is less than ADLER, which it needs to be at that point in the loop. Again, for 5552 to have any meaning. Those both need to be sx %= ADLER.

1
On

(I wrote this as a comment to Mark post, but SO refuses it is too long.)

I discovered the issue.

Today I tested up to 8 Gbytes of payload, confronting every 256 Mb the partial checksum of zlib.adler32 with my routines. One payload is random data, the other payload all 0xff.

Both the routines returns the same checksums, till the end. And my routine still uses np.uint16 for s1 and s2 !

It seems to me the numpy variable types are only a "hint" to the JIT, they aren't real types (like C declarations). At a certain point, the variables don't overflow, probably there is an hidden "unboxing" from 16 to 32 bits.

Check: if before the modulus I force 16 bits with 'sx &= 0xffff", the results aren't correct, like @Mark Adler wrote. I got sidetracked by this "unboxing" and also because I was working on the same routine on three different Python implementations.

Sometimes I hate Python: in C I have discovered the problem in 5 minutes. I'm an old programmer, used to compiled and typed languages ​​;-)