Suppose i have two integer arrays in device memory (cuda c code).
Examplex = [1, 2, 4, 8, 16, 32] y = [2, 5, 10, 20, 40, 50]
i want to do element-wise multiplication using cuBLAS.
I tried this and works but i think it is not the point of cuBLAS usage:
for (int i = 0; i < n; i++) {
cublasSscal(handle, 1, &x[i], &y[i], n);
}
and then the result is saved in y.
Result: y = [2, 10, 40, 160, 640, 1600]
Can i do the above multiplication in cuBLAS without using for loop?
Thanks
I expect avoid the for loop
Note that cublas doesn't have any options for handling integer data in most cases (excepting certain gemm operations that tap into tensor core, but these only support 8 bit integer or smaller.) If you must use integer data, I would recommend the other approaches below such as writing your own kernel or using thrust.
(I'm just copying my answer from here.)
For floating point data, it’s possible to use the CUBLAS dgmm function to do a vector elementwise multiply:
However its trivial to write a CUDA kernel to perform this task (it would be a trivial modification to the CUDA vectorAdd sample code, and I expect it would be faster than the above approach.
Also see here for thrust (and dgmm) suggestion.
It looks like it could probably be done with sbmv also.
This operation (regardless of approach used above) could be directly extended to a matrix-matrix elementwise product, simply be treating the matrices as vectors, and may in some settings be referred to as a Hadamard product.