I'm trying to find both the quickest and the most idiomatic way(s) to parse an i32 from a vector containing Ascii in Rust. I have three functions (minus error handling etc):
#![feature(test)]
extern crate test;
fn v_0() -> i32 {
let read_buf: Vec<u8> = vec![54, 52, 52];
let num: i32 = String::from_utf8(read_buf)
.unwrap()
.parse()
.unwrap();
assert_eq!(num, 644);
num
}
fn v_1() -> i32 {
let read_buf: Vec<u8> = vec![54, 52, 52];
let num: i32 = read_buf.iter().rev().enumerate().map(|(idx, val)| {
(val - 48) as i32 * 10.pow(idx)
}).sum();
assert_eq!(num, 644);
num
}
fn v_2() -> i32 {
use atoi::atoi;
let read_buf: Vec<u8> = vec![54, 52, 52];
let num = atoi(read_buf[..].try_into().unwrap()).unwrap();
assert_eq!(num, 644);
num
}
#[cfg(test)]
mod tests {
use test::Bencher;
use crate::{v_0, v_1, v_2};
#[bench]
fn v_0_bench(b: &mut test::Bencher) {
let n = test::black_box(1000);
b.iter(v_0)
}
#[bench]
fn v_1_bench(b: &mut test::Bencher) {
let n = test::black_box(1000);
b.iter(v_1)
}
#[bench]
fn v_2_bench(b: &mut test::Bencher) {
let n = test::black_box(1000);
b.iter(v_2)
}
}
And the benchmark results are as follows:
test tests::v_0_bench ... bench: 16 ns/iter (+/- 0)
test tests::v_1_bench ... bench: 0 ns/iter (+/- 0)
test tests::v_2_bench ... bench: 11 ns/iter (+/- 1)
I'm suspecting that the v_1 bench is getting optimized away, so as a question within a question, how do I prevent this?
Out of the 3 example functions, if any, which should one use and why?
Playing around this on my computer gives the following results.
The buffer has been removed from the benchmark and
black_box()is used in order to pretend the parameter is unpredictable and the result is used.The sequence of bytes has been made longer in order to let the algorithms actually work, and the invocations are repeated many times in order to introduce bigger differences in the results.
Comparing
v_0andv_1shows that a substantial part of the time is spent in checking the utf8 string.v_2is probably disappointing because of itspow()operation.v_3is clearly faster, but it does not check anything; the result could be meaningless because the bytes could contain anything but digits, and the resulting integer may also overflow if the sequence is too long!v_4is similar tov_3but considers an eventual starting-sign (because ani32is expected, not anu32); this first check does not seem to be too detrimental to performances.If we are absolutely certain that the sequence of bytes is correct, then we could lean towards the fastest version (
v_3orv_4), but in any other situation we should definitely preferv_0.