How to pack a Rust enum into its minimal size?

208 Views Asked by At

I have a Rust enum with some data. I want to pack it into as few bytes as possible. I tried using repr like this:

#[repr(u8)]
enum MyEnum {
    OptionA(u32),
    OptionB(u32),
    Nothing,
}

fn main() {
    println!("{}", std::mem::size_of::<MyEnum>()); // prints 8 (should be 5)
}

In theory, this should only need to take up 5 bytes (1 for a u8 discriminant and 4 for the u32s). But, regardless of what repr I use, it takes up the full 8 bytes, as if it was aligned to 4.

The official Rust docs make it clear that repr(u*) does what you expect for fieldless enums, but the section on enums with fields is ambiguous to me:

If the enum has fields, the effect is similar to the effect of repr(C) in that there is a defined layout of the type. This makes it possible to pass the enum to C code, or access the type's raw representation and directly manipulate its tag and fields.

So the layout is defined, but does the argument to repr just not do anything? Or is this a bug? This seems insane to me. I understand that it is often more performant to align fields, but what is the point of letting you specify a repr if it does nothing? If I want memory-efficient packing to I have to implement it myself and lose all of rusts pattern matching and safety guarantees?

2

There are 2 best solutions below

4
On BEST ANSWER

In general, this type has to be 8 bytes. That's because it contains a u32, which is 4 bytes and must be aligned to a 4-byte address, and if you have an array, then the elements of the array must each be aligned to a multiple of 4 bytes, which requires that the size be a multiple of 4. Otherwise, you could take a mut reference to the object which would not be aligned, and references are not allowed to be unaligned.

Unaligned access is always slower, and on some architectures it also kills the process with a SIGBUS. Some architectures that would normally have your process killed can have the kernel fix up the access at the enormous cost of a trap, a context switch into the kernel, two loads and some shifts, and then a context switch out of the kernel. Usually people on those architectures prefer the SIGBUS instead because then at least the problem is obvious. Even RISC-V, one of the newest architectures, doesn't guarantee fast unaligned access (it may trap into the kernel).

Note that the C compiler does the same thing:

#include <stdio.h>
#include <inttypes.h>

struct foo {
    uint8_t tag;
    uint32_t value;
};

int main(void)
{
    printf("%zu\n", sizeof(struct foo));
}

That prints 8. It is true that some C compilers offer packed representations, but they are nonstandard.

There has been some discussion of packed enums in Rust, but they have not been standardized yet, and so are not available.

0
On

As documented #[repr(u8)] has 2 effects on an enum with fields:

  1. there is a defined layout of the type. This makes it possible to pass the enum to C code, or access the type's raw representation and directly manipulate its tag and fields.

  2. If the discriminant overflows the integer it has to fit in, it will produce a compile-time error

So it definitely does something. Though the #[repr(packed)] that you seem to be after does not currently work on enums.

It is a very nieche repr anyways and comes with serious drawbacks and the documentation even states it

is not to be used lightly. Unless you have extreme requirements, this should not be used.

Should you despite all the warnings and probably about 0 benefits still want to do this you can use a manually tagged union:

#[repr(u8)]
enum Discriminant {
    OptionA,
    OptionB,
    Nothing,
}
union Payload {
    value: u32,
    nothing: (),
}
#[repr(packed)]
struct MyEnum {
    discriminant: Discriminant,
    payload: Payload,
}

You can even match on it, but creating a reference to the payload is UB (as references have to be aligned), so to read from it you have to use read_unaligned but that's true for any packed datastructure so unsafe is unavoidable to begin with (and not something brought in only for the union access).