snrm2 calculation instability for single-precision floats on Accelerate

Question

snrm2 calculation instability for single-precision floats on Accelerate

219 Views Asked by icyfox At 21 August 2023 at 17:04

I'm trying to use snrm2 to perform a single precision float calculation in Rust. I'm linking to the Accelerate framework on OSX and using the blas crate for the C-bridge. Regardless of the randomly generated vector values (which usually calculate to [12, 13] in the native code) the snrm2 logic consistently returns:

-36893490000000000000
-2
36893490000000000000
0

This same code works as advertised when I switch everything to f64 and dnrm2. Is this a bug within Accelerate, or am I not meeting some of its internal assumptions about memory alignment or call arguments?

use rand::prelude::*;
use blas::*;
type Vector = [f32; 2048];

// Native implementation
fn euclidean_distance_standard(a: &Vector, b: &Vector) -> f32 {
    a.iter().zip(b.iter()).fold(0.0, |acc, (&x, &y)| {
        let diff = x - y;
        acc + diff * diff
    }).sqrt()
}

// Using snrm2 from BLAS
fn euclidean_distance_blas(a: &Vector, b: &Vector) -> f32 {
    let mut diff = [0.0; 2048];
    for i in 0..2048 {
        diff[i] = a[i] - b[i];
    }
    unsafe {
        snrm2(diff.len() as i32, &diff, 1)
    }
}

fn main() {
    let mut rng = rand::thread_rng();

    // Initialize a random vector
    let mut vector_a = [0.0; 2048];
    for i in 0..2048 {
        vector_a[i] = rng.gen::<f32>();
    }

    let vector_b = [0.5; 2048];

    let result_standard = euclidean_distance_standard(&vector_a, &vector_b);
    let result_blas = euclidean_distance_blas(&vector_a, &vector_b);

    println!("Standard Method Result: {}", result_standard);
    println!("OpenBLAS Method Result: {}", result_blas);
}

# build.rs
fn main() {
    println!("cargo:rustc-link-lib=framework=Accelerate");
}

# Cargo.toml
[dependencies]
rand = "0.8"
blas = "0.21"
blas-src = { version = "0.9", features = ["accelerate"] }
libc = "0.2.36"

$ cargo run --release
    Finished release [optimized] target(s) in 0.01s
     Running `target/release/similarity-comp`
Standard Method Result: 13.005363
OpenBLAS Method Result: -36893490000000000000

Original Q&A

There are 2 best solutions below

**Bob** · Answer 1 · 2023-08-31T06:57:16.917000

This is a bug but probably in the wrapper, I tried in a few different ways to compute that.

First hypothesis was that the error was in sgrm2, then I tried to compute the squared norm using sdot, and the same behavior is observed.

Then I considerd another routine returning an f32, the 1-norm, sasum, the buggy results are observed there as well.

Then I tried one function that receives a vector of f32 and returns an integer, isamax, and it works fine.

This happens both using feature accelerate or openblas.

So my hypothesis is that the wrapper has a problem when returning f32. What I did next was to pick a level 2 function sgemv that computes y = alpha * A * x + beta * y, setting alpha=1.0, beta=0.0, and A = x' we get y = x'*x that is the square of the norm. The difference is that this will be written to a mutable [f32] argument, instead of returned by the function. In this case we get the expected result.

Here is the complete program you can test on your side


extern crate blas;
use blas::*;

fn main() {
    let v1: [f32; 6] = [2.0, 0.0, 5.0, 4.0, 2.0, 0.0];
    let mut y1: [f32; 1] = [0.0];
    let v1d: [f64; 6] = [2.0, 0.0, 5.0, 4.0, -2.0, 0.0];
    let mut y1d: [f64; 1] = [0.0];
    let n = v1.len() as i32;
    unsafe {
        sgemv('N' as u8, 1, n, 1.0, &v1, 1, &v1, 1,   0.0, &mut y1, 1);
    };
    println!("snrm2={:.3}", unsafe { snrm2(v1.len() as i32, &v1, 1) });
    println!("|x|_1 via sasum={:.3}", unsafe { sasum(v1.len() as i32, &v1, 1) });
    println!("|x|^2 via sdot={:.3}", unsafe { sdot(v1.len() as i32, &v1, 1, &v1, 1) });
    println!("argmax isamax={}", unsafe { isamax(v1.len() as i32, &v1, 1) });
    println!("|x|^2 via sgemv={:.3}", y1[0]);
    
    unsafe {
        dgemv('N' as u8, 1, n, 1.0, &v1d, 1, &v1d, 1,   0.0, &mut y1d, 1);
    };
    println!("dnrm2={:.3}", unsafe { dnrm2(v1d.len() as i32, &v1d, 1) });
    println!("|x|_1 via dasum={:.3}", unsafe { dasum(v1d.len() as i32, &v1d, 1) });
    println!("|x|^2 via ddot={:.3}", unsafe { ddot(v1d.len() as i32, &v1d, 1, &v1d, 1) });
    println!("argmax via idamax={}", unsafe { idamax(v1d.len() as i32, &v1d, 1) });
    println!("|x|^2 via dgemv={:.3}", y1d[0]);
    
}

The input vector has 2-norm 7 (squared norm 49), and 1-norm 13.

The program output was correct for all fp64 routines, for fp32 routines only isamax and sgemv behaved as expected.

snrm2=0.000
|x|_1 via sasum=0.000
|x|^2 via sdot=0.000
argmax isamax=3
|x|^2 via sgemv=49.000
dnrm2=7.000
|x|_1 via dasum=13.000
|x|^2 via ddot=49.000
argmax via idamax=3
|x|^2 via dgemv=49.000

**icyfox** · Answer 2 · 2023-08-31T15:24:17.167000

I got in touch with Apple about this and they pointed me to their latest implementation of the Fortran77 interface. It returns doubles for all floating point values:

func cblas_snrm2(
    _ __N: Int32,
    _ __X: UnsafePointer<Float>!,
    _ __incX: Int32
) -> Float

The blas crate specifies the return value as f32 (so should probably be patched). But modifying the local foreign function interface like this gives the right answer, at least on arm architectures.

extern crate libc;
use libc::c_int;

extern "C" {
    fn snrm2_(n: *const c_int, x: *const f32, incx: *const c_int) -> f64;
}

pub unsafe fn snrm2(n: i32, x: &[f32], incx: i32) -> f64 {
    snrm2_(&n, x.as_ptr(), &incx)
}

Standard: 173.71765, OpenBLAS: 173.71761182650607

snrm2 calculation instability for single-precision floats on Accelerate

There are 2 best solutions below

Related Questions in RUST

Related Questions in BLAS

Related Questions in ACCELERATE-FRAMEWORK

Trending Questions

Popular # Hahtags

Popular Questions