Performance difference between bitpacking bytes into a u32 vs storing them in a vec<u8>?

1.1k Views Asked by At

Intro:

I'm curious about the performance difference (both cpu and memory usage) of storing small numbers as bitpacked unsigned integers versus vectors of bytes

Example

I'll use the example of storing RGBA values. They're 4 Bytes so it is very tempting to store them as a u32.
However, it would be more readable to store them as a vector of type u8.


As a more detailed example, say I want to store and retrieve the color rgba(255,0,0,255)

This is how I would go about doing the two methods:

// Bitpacked:
let i: u32 = 4278190335;
//binary is 11111111 00000000 00000000 11111111
//In reality I would most likely do something more similar to:
let i: u32 = 255 << 24 + 255; //i think this syntax is right

// Vector:
let v: Vec<u8> = [255,0,0,255];

Then the two red values could be queried with

i >> 24 
//or
&v[0]
//both expressions evaluate to 255 (i think. I'm really new to rust <3 )

Question 1

As far as I know, the values of v must be stored on the heap and so there are the performance costs that are associated with that. Are these costs significant enough to make bit packing worth it?

Question 2

Then there's the two expressions i >> 24 and &v[0]. I don't know how fast rust is at bit shifting versus getting values off the heap. I'd test it but I won't have access to a machine with rust installed for a while. Are there any immediate insights someone could give on the drawbacks of these two operations?

Question 3

Finally, is the difference in memory usage as simple as just storing 32 bits on the stack for the u32 versus storing 64 bits on the stack for the pointer v as well as 32 bits on the heap for the values of v?

Sorry if this question is a bit confusing

2

There are 2 best solutions below

1
On BEST ANSWER

Using a Vec will be more expensive; as you mentioned, it will need to perform heap allocations, and access will be bounds-checked as well.

That said, if you use an array [u8; 4] instead, the performance compared with a bitpacked u32 representation should be almost identical.

In fact, consider the following simple example:

pub fn get_red_bitpacked(i: u32) -> u8 {
    (i >> 24) as u8
}

pub fn get_red_array(v: [u8; 4]) -> u8 {
    v[3]
}

pub fn test_bits(colour: u8) -> u8 {
    let colour = colour as u32;
    let i = (colour << 24) + colour;
    get_red_bitpacked(i)
}

pub fn test_arr(colour: u8) -> u8 {
    let v = [colour, 0, 0, colour];
    get_red_array(v)
}

I took a look on Compiler Explorer, and the compiler decided that get_red_bitpacked and get_red_array were completely identical: so much so it didn't even bother generating code for the former. The two "test" functions obviously optimised to the exact same assembly as well.

example::get_red_array:
        mov     eax, edi
        shr     eax, 24
        ret

example::test_bits:
        mov     eax, edi
        ret

example::test_arr:
        mov     eax, edi
        ret

Obviously this example was seen through by the compiler: for a proper comparison you should benchmark with actual code. That said, I feel fairly safe in saying that with Rust the performance of u32 versus [u8; 4] for these kinds of operations should be identical in general.

0
On

tl;dr use a struct:

struct Color {
    r: u8,
    g: u8,
    b: u8,
    a: u8,
}

Maybe use repr(packed) as well.

It gives you the best of all worlds and you can give the channels their name.

Are these costs significant enough to make bit packing worth it?

Heap allocation has a huge cost.

Are there any immediate insights someone could give on the drawbacks of these two operations?

Both are noise compared to allocating memory.