I'm reading through the following paper on how to implement CRC32 efficiently using the PCLMULQDQ instruction introduced in Intel Westmere and AMD Bulldozer:
V. Gopal et al. "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction." 2009. http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
I understand the algorithm, but one thing I'm not sure about is how to calculate the constants $k_i$. For example, they provide the constant values for the IEEE 802.3 polynomial:
- k1 = x^(4*128+64) mod P(x) = 0x8833794C
- k4 = x^128 mod P(x) = 0xE8A45605
- mu = x^64 div P(x) = 0x104D101DF
and so on. I can just use these constants as I only need to support the one polynomial, but I'm interested: how did they calculate those numbers? I can't just use a typical bignum implementation (e.g. the one provided by Python) because the arithmetic must happen in GF(2).
It's just like regular division, except you exclusive-or instead of subtract. So start with the most significant 1 in the dividend. Exclusive-or the dividend by the polynomial, lining up the most significant 1 of the polynomial with that 1 in the dividend to turn it into a zero. Repeat until you have eliminated all of the 1's above the low n bits, where n is the order of the polynomial. The result is the remainder.
Make sure that your polynomial has the high term in the n+1th bit. I.e., use
0x104C11DB7
, not0x4C11DB7
.If you want the quotient (which you wrote as "div"), then keep track of the positions of the 1's you eliminated. That set, shifted down by n, is the quotient.
Here is how: