Faster way to convert three uint32_t to generate a unique key in C

198 Views Asked by At

I have three uint32_t that when combined, they will generate a unique key. I have to do this about 100M or more & potentially several times a day and store that in a key-value database. I'd like to keep the key to the least amount of bytes possible. I'm doing it in the following way but I'm curious if there's is a faster way to do this.

char *key = xmalloc(snprintf(NULL, 0, "%" PRIu32 "-%" PRIu32 "-%" PRIu32,num1,num2,num3) + 1);   
sprintf(key, "%" PRIu32 "-%" PRIu32 "-%" PRIu32, num1,num2,num3);
2

There are 2 best solutions below

3
On BEST ANSWER
  • Converting to decimal representation is rather costly. You can get faster conversion if you use hexadecimal:

      sprintf(key, "%" PRIx32 "-%" PRIx32 "-%" PRIx32, num1, num2, num3);
    
  • As @AKX mentioned, use a fixed sized buffer. Since the string is (presumably) copied into the database, you shouldn't worry about it taking more space than necessary in the DB:

      char key[32];
      snprintf(key, sizeof(key), "%" PRIx32 "-%" PRIx32 "-%" PRIx32, num1, num2, num3);
    

    The database engine doesn't know that you over-allocated your buffer. It will allocate it's own memory based on the actual length of the string rather than the size of the buffer.

  • Implement your own hexadecimal formatting. snprintf needs to parse its format string and interpret it against the argument list at runtime. This has a non-negligible overhead for such tasks like yours. Instead you can make your own int32-to-hex conversion that's specialized for your task. I'd use "abcdefghijklmnop" for the digits instead of the traditional "0123456789abcdef".

  • Does your key-value database require text-encoded keys? If not, you can try a binary encoding for your keys (e.g. take a look at SQLite4 varint encoding for inspiration).

0
On

If you prefer text-encoded keys, I would take Yakov's suggestion a step further (well, TWO steps) and use base64 encoding instead of hex. This way you will pack 6 bits into one character instead of only 4.

The implementation would have multiple bitshifts plus lookup table. I bet it will be faster than printf.