While doing matrix multiplication on SIMD, I am facing problem in adding all elements of the vector.
float16 sum = row * column;
Now as sum is vector variable of 16 values. I want to add all the values for matrix multiplication. Is there a built in function in opencl or using mac unit?
PS: the dot function only works with float4
Assuming you want to perform a matrix-vector multiplicattion
y=M*x
with every GPU thread on different data, you could do thefloat16
dot product manually:However the better solution here is to ditch OpenCL vector data types entirely and go with
float
instead. Then they=M*x
is implemented like this: