I am trying to unsuccessfully launch template kernel as cooperative kernel in CUDA C++ , what am I doing wrong
error
Error cannot determine which instance of function template "boolPrepareKernel" is intended
I try to invoke kernel like below
ForBoolKernelArgs<int> fbArgs = ...;
int device = 0;
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, device);
cudaLaunchCooperativeKernel((void*)boolPrepareKernel, deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;
kernel is defined like
template <typename TYO>
__global__ void boolPrepareKernel(ForBoolKernelArgs<TYO> fbArgs) {
...
}
I tried parametrarize launch (in this example with int) like
cudaLaunchCooperativeKernel((void*)(<int>boolPrepareKernel), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;
but I get error
no instance of overloaded function matches the argument list argument types are: (<error-type>, int, dim3, ForBoolKernelArgs<int>)
For suggested case
cudaLaunchCooperativeKernel((void*)(boolPrepareKernel<int>), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs)
My error is
no instance of overloaded function matches the argument list argument types are: (void *, int, dim3, ForBoolKernelArgs<int>)
This is probably sth simple but I am stuck - thanks for help !!
For reference kernel launch like
boolPrepareKernel << <fFArgs.blocks, fFArgs.threads >> > (fbArgs);
works but of course grid synchronization is unavailable.
Here is a minimal example that will compile:
Probably the main issue you had remaining is that you are not following proper instructions for passing kernel arguments.