Finding the length of the common prefix in two bytes

1.3k Views Asked by At

Given two bytes, how would I find the length of the common bits at the start of the two bytes.

For example:

9 == 00001001
6 == 00000110

Common prefix is 0000, length 4

I'm working in C#, so please stick to C# operations only.

Addendum: This particular piece of code will run thousands of times and needs to be very fast.

10

There are 10 best solutions below

7
On BEST ANSWER

EDIT: Thanks to the comments, I found that I misunderstood the problem. (Below is a fixed version).

With a lookup table:

readonly static int[] bytePrefix = new int[] {
    8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};

And use it XORing the two bytes:

bytePrefix[9 ^ 6]

I believe this is as fast as it can get, it's just one XOR operation and an array lookup (you can also change it to 2 array lookups, but it would use 256 times more memory and probably be slower, bitwise it really fast).

1
On

Here's one without a table or a loop:

len =  (a^b) ? (7 - (int)Math.Log( a^b, 2)) : 8;

Explanation:

log2 X is the power to which the number 2 must be raised to obtain the value X. Since each bit in a binary number represents the next power of 2, you can use this fact to find the highest bit set (counting from 0):

2**0   = 1 = 0b0001;  log2(1) = 0
2**1   = 2 = 0b0010;  log2(2) = 1
2**1.6 =~3 = 0b0011;  log2(3) =~1.6; (int)log2(3) = 1
2**2   = 4 = 0b0100;  log2(4) = 2
...
2**3   = 8 = 0b1000;  log2(8) = 3

So the code works by taking a XOR b, which sets only the bits which are different. If the result is non-zero, we use log2 to find the highest bit set. 7 less the result gives the number of leading zeros = the number of common bits. There is a special case when a XOR b == 0: log2(0) is -Infinity, so that won't work, but we know that all the bits must match, so the answer is 8.

0
On

The 256-byte table versions seem quite nice; depending upon caching and branching issues, a 16-byte table version might or might not run faster. Something like:

/* Assumes table[16] is defined similarly to the table[256] in earlier examples */
unsigned int find_mismatch(unsigned char a, unsigned char b)
{
  unsigned char mismatch;
  mismatch = a^b;
  if (mismatch & 0xF0)
    return table[mismatch >> 4];
  else
    return table[mismatch]+4;
}

More instructions, including a branch, but since the table is now only 16 bytes it would take only one or two cache misses to fill entirely. Another approach, using a total of three lookups on a 16-byte table and a five-byte table, but no branching:

unsigned char table2[5] = {0,0,0,0,0xFF};

unsigned int find_mismatch(unsigned char a, unsigned char b)
{
  unsigned char mismatch,temp2;
  mismatch = a^b;
  temp2 = table[mismatch >> 4];
  return temp2 + (table2[temp2] & table[mismatch & 15]);
}

One would have to do some profiling in the real application to see whether the reduced cache load of the smaller tables was sufficient to offset the extra instructions.

1
On

If you're in a space-limited environment (which obviously you're not if you're using C#, but just in general) and can't afford a lookup table:

byte test = byte1 ^ byte2;
int length = 0;
if ((test & 0x80) == 0)
{
    if ((test & 0x40) == 0)
    {
        if ((test & 0x20) == 0)
        {
            if ((test & 0x10) == 0)
            {
                // I think you get the idea by now.
                // Repeat for the lower nibble.
            }
            else
                length = 3;
        }
        else
            length = 2;
    }
    else
        length = 1;
}

This is basically an unraveled loop to find the first 1 bit in the XOR'd number. I don't think it can get any faster than this without the lookup table.

0
On
int i;
for (i=0;i<sizeof(byte);i++)
    if (a >> sizeof(byte)-i != b >> sizeof(byte)-i) break;
12
On
byte x = 9;
byte y = 6;

while ( x != y )
{
    x >>= 1;
    y >>= 1;
}

Basically, remove a bit from the right of each number until the two are equal. When they become equal, their bits are equal too.

You can keep track of the length of the prefix easily by introducing another variable. I'll leave that to you.

If you want it to be fast, and considering that you're dealing with bytes, why not precompute the values and return the answer in a single operation? Run this algorithm for all possible combinations of two bytes and store the result in a table.

You only have 2^8 * 2^8 = 2^16 possibilities (2^15 actually, because x = 6 and y = 9 is the same as x = 9 and y = 6). If you can afford the initial time and memory, precomputation should be fastest in the end.

Edit:

You got a solution that's at least better for precomputation and probably faster in general: find the leftmost 1 bit in x ^ y. Using this, build a table Pre where Pre[i] = position of leftmost 1 bit in i. You only need 2^8 bytes for this table.

1
On

Here's a procedural way:

int r = 8;
while (a != b)
{
    a >>= 1;
    b >>= 1;
    r -= 1;
}

Here's a way that uses a lookup table with just 256 entries:

int[] lookupTable;

void createLookupTable()
{
    lookupTable = new int[256];
    for (int a = 0; a <= 255; ++a)
    {
        int n = 8;
        byte b = (byte)a;
        while (b > 0) {
            b >>= 1;
            n -= 1;
        }
        lookupTable[a] = n;
    }
}

int commonPrefix(byte a, byte b)
{
    return lookupTable[a ^ b];
}

And just for fun here's a way to do it with LINQ:

int r = 8 - Enumerable.Range(0, 9).Where(n => a >> n == b >> n).First();
0
On

This can be restated as a simpler problem with a known fast solution:

  • Find the left-most true bit in X ^ Y.

Some code (apparently code can't immediately follow a bulleted list?!?)

 int findCommonPrefix(long x, long y, out long common)
 {
    int prefixPlace = 0;
    int testPlace = 32;
    long w, mismatch = x ^ y;
    do {
       w = mismatch >> testPlace;
       if (w != 0) { prefixPlace |= testPlace; mismatch = w; }
       testPlace >>= 1;
    } while (testPlace != 0);
    common = x >> prefixPlace;
    return 64 - prefixPlace;
 }

This needs only 6 iterations to find the common prefix in a 64-bit long, the byte version will need only 3 iterations. Unroll the loop for even more speed.

3
On

Another approach using exclusive or (xor):

public int GetCommonPrefixLength(byte a, byte b)
{
    int c = a ^ b;
    int len = -1;
    while ((++len < 8) && ((c & 0x80) == 0))
        c = c << 1;
    return len;
}
0
On

First get the binary difference between the bytes using the xor operator. Then you just shift bits out to the right until the difference is zero:

byte b1 = 6;
byte b2 = 9;

int length = 8;
for (int diff = b1 ^ b2; diff != 0; length--) diff >>= 1;

This will give you a minimum of calculations in the loop, so it will be rather fast.