I'm studying how to offload some quite heavy calculations on GPUs.
Although on my machine I have a NVIDIA RTX GPU
, I would like to avoid using
CUDA in order to develop something portable on other GPUs as well (at least in its core).
Thus the choice of OpenCL.
Now, my current biggest concern is that, within the core that is suitable for offload I intensively make use of LAPACK SVD implementation. However, in OpenCL, kernel code cannot either:
- Be linked to external libraries. There's a "workaraound" using
clEnqueueNativeKernel()
, but this does not seem to apply in this case (call within a kernel itself) (not to mention this is not very portable, since it is needed the device to supportCL_EXEC_NATIVE_KERNEL
capability); - Accept function pointers as kernel arguments.
So, does anyone know of the existence of a OpenCL kernel SVD open-source implemetation, which can then be called within a parent OpenCL kernel?
I googled, and found several libraries/implementations of SVD for GPU offload, but I couldn't see how to "embed" them into an OpenCL kernel (they all seem implementations to be launched from host code). If I'm wrong, please correct me. Any help is more than welcome.
Implement an event-callback API between host and kernel using only atomic functions such that:
From https://man.opencl.org/atomic_store.html:
I don't know if your graphics card / driver supports this. OpenCL 2.0 may not be fully supported by all GPUs.
To make host-side libraries run directly on GPU, you'll need to convert some parts by hand:
Latency of just an atomically-triggered GPU-library call should be negligible if the work is heavy but it's not suitable when every clock-cycle is required on GPU-side. So it wouldn't be good for working with small matrices.