With CUDA 12.0, support has been added to loading libraries of kernels dynamically from disk or from memory: Driver API, § 6.12 Library Management. And from these libraries, one can load "kernels" - without an associated device nor a context. Their handle is a CUkernel, as opposed to CUfunction for proper, in-context, kernels.
Now, in the § 6.22 execution control section of the Driver API, various launch functions are now described as taking either "a CUDA function CUfunction or a CUDA kernel CUkernel: cuLaunchKernel, cuLaunchKernelEx, cuLaunchCooperativeKernel and perhaps additional ones.
The thing is, that when I look at their signatures - they all still take plain old CUfunction's no CUkernel - and there is no overloaded function which differs on the choice of this parameter.
So, what gives? Can we launch CUKernel's, or can't we?
You can cast
CUkerneltoCUfunctionwhen using it withcuLaunchKernel, as indicated on this part of the CUDA Driver API:For more information, check out this blog post on context-independent module loading. Thanks!