How to add all elements of vector (int16) in opencl on vector processor SIMD?

500 Views Asked by At

While doing matrix multiplication on SIMD, I am facing problem in adding all elements of the vector.

float16 sum = row * column;

Now as sum is vector variable of 16 values. I want to add all the values for matrix multiplication. Is there a built in function in opencl or using mac unit?

PS: the dot function only works with float4

2

There are 2 best solutions below

2
On

Assuming you want to perform a matrix-vector multiplicattion y=M*x with every GPU thread on different data, you could do the float16 dot product manually:

float sum = row.s0*column.s0+row.s1*column.s1+...+row.sf*column.sf;

However the better solution here is to ditch OpenCL vector data types entirely and go with float instead. Then the y=M*x is implemented like this:

#define def_dim 16 // matrix dimension
float M[def_dim*def_dim]; // matrix
float x[def_dim]; // input vector

// fill M and x wih data

float y[def_dim]; // result vector
#pragma unroll // improve performance by loop unrolling
for(uint i=0; i<def_dim; i++) {
    y[i] = 0.0f;
    #pragma unroll // inprove performance by loop unrolling
    for(uint j=0; j<def_dim; j++) y[i] = fma(M[i*def_dim+j], x[j], y[i]); // multiply y=M*x
}
0
On

So to add the elements of vector, add all them individually. As of now no standard function from opencl is available for this.

        // Adding all the elements of the vector
        desiredSum += sum.s0;
        desiredSum += sum.s1;
        desiredSum += sum.s2;
        desiredSum += sum.s3;
        desiredSum += sum.s4;
        desiredSum += sum.s5;
        desiredSum += sum.s6;
        desiredSum += sum.s7;
        desiredSum += sum.s8;
        desiredSum += sum.s9;
        desiredSum += sum.sa;
        desiredSum += sum.sb;
        desiredSum += sum.sc;
        desiredSum += sum.sd;
        desiredSum += sum.se;
        desiredSum += sum.sf;