How does serde/bincode serialize byte arrays?

4.5k Views Asked by At

This code serializes an array of 32 bytes exactly as I want:

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)]
struct Hash([u8; 32]);

let hash = Hash([1u8; 32]);
let hash_bin = bincode::serialize(&hash).unwrap();
assert_eq!(hash_bin, [1u8; 32]);

How does it work?

According to https://serde.rs/impl-serializer.html there is a serialize_bytes() function, but the bincode version it prepends a length to the data.

fn serialize_bytes(self, v: &[u8]) -> Result<()> {
    O::IntEncoding::serialize_len(self, v.len())?;
    self.writer.write_all(v).map_err(Into::into)
}

What function in Serialize does the code (further above) call, to serialize the 32 bytes as themselves with no length prefix?


Context: I'm implementing a customer serialize for a type and I want it (under some circumstances) to serialize arrays of bytes so that bincode encodes them as bytes with no length prefix. This is a problem because calling serialize_bytes() adds a length prefix.

I want to understand how arrays of bytes are serialized by default, as I do not know which method to call in place of serialize_bytes() to get bytes without a length prefix.

1

There are 1 best solutions below

0
On

How are [u8; N] and [u8] serialized?

To cut strait to the point, here is how serde 1.0.151 implements each method. serialize_bytes is not actually part of serde so it gets treated as a sequence.

// [T; N] is serialized as a tuple. However, this is only implemented for N 0 to 32 inclusively.
let mut seq = try!(serializer.serialize_tuple(N));
for e in self {
    try!(seq.serialize_element(e));
}
seq.end()

// [T] is serialized as a sequence.
serializer.collect_seq(self)

The methods serialize_tuple and collect_seq are implemented by the specific serializer you are using.

The easy way

One common problem is that serde only implements Serialize/Deserialize for arrays up to length 32. The easiest approach is to use a crate like serde_with which adds extra serialize/deserialize implementations you can attach to your structs. Here is an example taken from their documentation:

#[serde_as]
#[derive(Deserialize, Serialize)]
struct Arrays<const N: usize, const M: usize> {
    #[serde_as(as = "[_; N]")]
    constgeneric: [bool; N],

    #[serde_as(as = "Box<[[_; 64]; N]>")]
    nested: Box<[[u8; 64]; N]>,

    #[serde_as(as = "Option<[_; M]>")]
    optional: Option<[u8; M]>,
}

How can we implement it ourselves? Rust Playground

Serialize

Performing serialization is actually quite easy. Serde does not have a concept of arrays, so we need to choose between serialize_tuple or serialize_seq. Under the hood, the only difference is serialize_seq may not have a known length so we can choose serialize_tuple.

pub fn serialize<S, T, const N: usize>(this: &[T; N], serializer: S) -> Result<S::Ok, S::Error>
where
    S: Serializer,
    T: Serialize,
{
    let mut seq = serializer.serialize_tuple(N)?;
    for element in this {
        seq.serialize_element(element)?;
    }
    seq.end()
}

Deserialize

On the other hand, deserialize gets a bit more complicated. We need to define a visitor that then specifies how each element should be visited. I wrote out a single example of how it could be done in the general case of an array, but this is not the most optimal solution since it first deserializes onto the stack. I also had to make use of unsafe code to only initialize the array one element at a time, but that unsafe code can easily be removed if T: Default or if an expanding data structure like a Vec<T> is used instead. Generally, this is more intended to be a guide for implementing deserialize on a sequence.

pub fn deserialize<'de, D, T, const N: usize>(deserializer: D) -> Result<[T; N], D::Error>
where
    D: Deserializer<'de>,
    T: 'de + Deserialize<'de>,
{
    deserializer.deserialize_seq(ArrayVisitor { _phantom: PhantomData })
}

struct ArrayVisitor<'de, T, const N: usize> {
    _phantom: PhantomData<&'de [T; N]>,
}

impl<'de, T, const N: usize> Visitor<'de> for ArrayVisitor<'de, T, N>
where
    T: Deserialize<'de>,
{
    type Value = [T; N];

    fn expecting(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "array of length {}", N)
    }

    fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
    where
        A: SeqAccess<'de>,
    {
        let mut array: MaybeUninit<[T; N]> = MaybeUninit::uninit();

        for index in 0..N {
            // Get next item as Result<Option<T>, A::Error>. Since we know
            // exactly how many elements we should receive, we can flatten
            // this to a Result<T, A::Error>.
            let next = seq.next_element::<T>()
                .and_then(|x| x.ok_or_else(|| Error::invalid_length(N, &self)));
        
            match next {
                Ok(x) => unsafe {
                    // Safety: We write into the array without reading any
                    // uninitialized memory and writes only occur within the
                    // array bounds at multiples of the array stride.
                    let array_base_ptr = array.as_mut_ptr() as *mut T;
                    ptr::write(array_base_ptr.add(index), x);
                },
                Err(err) => {
                    // Safety: We need to manually drop the parts we
                    // initialized before we can return.
                    unsafe {
                        let array_base_ptr = array.as_mut_ptr() as *mut T;

                        for offset in 0..index {
                            ptr::drop_in_place(array_base_ptr.add(offset));
                        }
                    }
                    
                    return Err(err)
                },
            }
        }

        // Safety: We have completely initialized every element
        unsafe { Ok(array.assume_init()) }
    }
}

If anyone is curious how derive(Deserialize) works on structs, I would recommend looking at this Rust Playground where I expanded macros and then cleaned up the output to be more human readable. Seeing how serialize/deserialize works can really help to demystify the process.