I'm trying to use snrm2 to perform a single precision float calculation in Rust. I'm linking to the Accelerate framework on OSX and using the blas crate for the C-bridge. Regardless of the randomly generated vector values (which usually calculate to [12, 13] in the native code) the snrm2 logic consistently returns:
- -36893490000000000000
- -2
- 36893490000000000000
- 0
This same code works as advertised when I switch everything to f64 and dnrm2. Is this a bug within Accelerate, or am I not meeting some of its internal assumptions about memory alignment or call arguments?
use rand::prelude::*;
use blas::*;
type Vector = [f32; 2048];
// Native implementation
fn euclidean_distance_standard(a: &Vector, b: &Vector) -> f32 {
a.iter().zip(b.iter()).fold(0.0, |acc, (&x, &y)| {
let diff = x - y;
acc + diff * diff
}).sqrt()
}
// Using snrm2 from BLAS
fn euclidean_distance_blas(a: &Vector, b: &Vector) -> f32 {
let mut diff = [0.0; 2048];
for i in 0..2048 {
diff[i] = a[i] - b[i];
}
unsafe {
snrm2(diff.len() as i32, &diff, 1)
}
}
fn main() {
let mut rng = rand::thread_rng();
// Initialize a random vector
let mut vector_a = [0.0; 2048];
for i in 0..2048 {
vector_a[i] = rng.gen::<f32>();
}
let vector_b = [0.5; 2048];
let result_standard = euclidean_distance_standard(&vector_a, &vector_b);
let result_blas = euclidean_distance_blas(&vector_a, &vector_b);
println!("Standard Method Result: {}", result_standard);
println!("OpenBLAS Method Result: {}", result_blas);
}
# build.rs
fn main() {
println!("cargo:rustc-link-lib=framework=Accelerate");
}
# Cargo.toml
[dependencies]
rand = "0.8"
blas = "0.21"
blas-src = { version = "0.9", features = ["accelerate"] }
libc = "0.2.36"
$ cargo run --release
Finished release [optimized] target(s) in 0.01s
Running `target/release/similarity-comp`
Standard Method Result: 13.005363
OpenBLAS Method Result: -36893490000000000000
This is a bug but probably in the wrapper, I tried in a few different ways to compute that.
First hypothesis was that the error was in
sgrm2, then I tried to compute the squared norm using sdot, and the same behavior is observed.Then I considerd another routine returning an
f32, the 1-norm, sasum, the buggy results are observed there as well.Then I tried one function that receives a vector of
f32and returns an integer, isamax, and it works fine.This happens both using feature
accelerateoropenblas.So my hypothesis is that the wrapper has a problem when returning
f32. What I did next was to pick a level 2 function sgemv that computesy = alpha * A * x + beta * y, settingalpha=1.0,beta=0.0, andA = x'we gety = x'*xthat is the square of the norm. The difference is that this will be written to a mutable[f32]argument, instead of returned by the function. In this case we get the expected result.Here is the complete program you can test on your side
The input vector has 2-norm 7 (squared norm 49), and 1-norm 13.
The program output was correct for all fp64 routines, for fp32 routines only isamax and sgemv behaved as expected.