16 bit unique key to 8 bit array index (perfect hash)

503 Views Asked by At

I have a set of structs that are uniquely identified by a uint16_t. There will never be more than 256 of these structs (due to reasons I won't get into a uint16_t must be used to identify the structs).

I would like to store these structs via an array of pointers. Since there will never be more than 256 structs, I would like to statically allocate an array of struct pointers with size 256. To do this though, I need a function to uniquely map uint16_t onto uint8_t.

Given that I will know all of the keys at runtime (though I won't know before runtime) is there an algorithm that exists that will give me a unique mapping (i.e. perfect hash) in general?

One caveat is that the system I am using has 16bit addresses. So for efficiency reasons I don't want to use any types larger than uint16_t.

2

There are 2 best solutions below

2
John Bollinger On

Given that I will know all of the keys at runtime (though I won't know before runtime) is there an algorithm that exists that will give me a unique mapping (i.e. perfect hash) in general?

Given that you have up to 256 (16-bit) values to map, there are in principle many mappings you could use. If the keys to be supported are uncorrelated, however, then any algorithm to compute the mappings requires all 256 keys or functions of them as parameters. In comments, for example, the idea of a 256th-degree polynomial is discussed, and the parameters would there are the coefficients of the polynomial.

Now consider that since the mapping needs 256 parameters, it also will use all those parameters in some way. How, then, can something with these general characteristics be efficiently computable?

The best solution I see for uncorrelated keys is to put all the keys in an array, sort them, and use each key's index in the sorted array as the wanted hash value. (Thus the parameters are the keys themselves in this case.) You can compute those indices fairly efficiently by binary search. Supposing that you store the sorted array for the duration of the program, I don't think you can do any such computation much more efficiently than this, and it's simple enough that you can be confident of its correctness.

That assumes you know all the keys before you need to hash any of them. If that is not the case then at minimum you can use an unsorted array and a linear search (though there may be in-between approaches, too). A linear search may not seem particularly efficient, but it's unlikely to be worse on average than a purely arithmetic computation involving 256 parameters.

11
HXSP1947 On

I ended up using the first fit algorithm to uniquely map 16 bit values onto 8 bit values (works under the assumption that there are no more than 256 16 bit values). Below is a very short example that I coded up to test it. While the mapping function is fairly expensive (called create mapping below), the get_value function is constant. Therefore, once the mapping is established it should be fairly quick to calculate the hash (give by remainder + offset[divisor] in my example) and get the associated value.

uint16_t keys[256];
uint16_t actual_mapping[256];
uint8_t offset[256];
uint8_t num_keys = 0;

void 
create_mapping()
{
    uint8_t mapping_matrix[num_keys][2];

    uint8_t index;
    uint8_t test_index;
    for(index = 0; index < num_keys; index++)
    {
        mapping_matrix[index][0] = (uint8_t) (keys[index] / 256);
        mapping_matrix[index][1] = keys[index] % 256;
    }

    for(index = 0; index < num_keys - 1; index++)
    {
        uint8_t hash_not_found = 1;
        while(hash_not_found)
        {
            hash_not_found = 0;
            for(test_index = index + 1; test_index < num_keys; test_index++)
            {
                if(mapping_matrix[index][0] != mapping_matrix[test_index][0])
                {
                    if((uint8_t) (mapping_matrix[index][1] + offset[mapping_matrix[index][0]]) == (uint8_t) (mapping_matrix[test_index][1] + offset[mapping_matrix[test_index][0]]))
                    {
                        hash_not_found = 1;
                        offset[mapping_matrix[index][0]]++;
                        break;
                    }
                }
            }
        }

        actual_mapping[(uint8_t) (mapping_matrix[index][1] + offset[mapping_matrix[index][0]])] = keys[index];
    }
}

uint16_t
get_value(uint16_t value)
{
    uint8_t divisor = (uint8_t) (value / 256);
    uint8_t remainder = value % 256;
    return actual_mapping[(uint8_t) (remainder + offset[divisor])];
}

int main(int argc, char** argv) {

    keys[0] = 0;
    keys[1] = 256;
    keys[2] = 4;
    keys[3] = 13000;
    keys[4] = 14000;
    keys[5] = 15000;
    keys[6] = 16000;
    keys[7] = 3500;
    keys[8] = 69;
    keys[9] = 15;
    keys[10] = 16;
    keys[11] = 789;
    keys[12] = 12001;

    num_keys = 13;

    create_mapping();

    uint8_t index;
    for(index = 0; index < num_keys; index++)
    {
        printf("%hu\n", get_value(keys[index]));
    }  


}