Consider the following CUDA program, in a file named foo.cu
:
#include <cooperative_groups.h>
#include <stdio.h>
__global__ void my_kernel() {
auto g = cooperative_groups::this_grid();
g.sync();
}
int main(int, char **) {
cudaLaunchCooperativeKernel( (const void*) my_kernel, 2, 2, nullptr, 0, nullptr);
cudaDeviceSynchronize();
}
This program doesn't do much - but it's a valid program (if your compute capability is high enough to support the entire grid as a cooperative group). It should compile link and run. However, I get this:
$ nvcc -o foo -gencode arch=compute_61,code=sm_61 foo.cu
ptxas fatal : Unresolved extern function 'cudaCGGetIntrinsicHandle'
Surprising! It doesn't help if I add some specific -l
and -L
flags, e.g.:
$ nvcc -o foo -gencode arch=compute_61,code=sm_61 foo.cu -L"/usr/lib/x86_64-linux-gnu/" \
-L"/usr/lib/x86_64-linux-gnu/stubs" -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ptxas fatal : Unresolved extern function 'cudaCGGetIntrinsicHandle'
Why is this happening? And how should I modify the nvcc command-line to make it find that unresolved symbol?
Notes:
- I'm using Devuan GNU/Linux 3.0.
- CUDA 10.1 is installed as a distribution package, so that its libraries are under
/usr/lib/x86_64-linux-gnu
. - An x86_64 machine with a GeForce 1050 Ti card.