I am doing dynamic parallelism programming using CUDA 5.5 and an NVDIA GeForce GTX 780 whose compute capability is 3.5. I am calling a kernel function inside a kernel function but it is giving me an error:
error : calling a __global__ function("kernel_6") from a __global__ function("kernel_5") is only allowed on the compute_35 architecture or above
What am I doing wrong?
You need to let nvcc generate CC 3.5 code for your device. This can be done by adding this option to nvcc command line.
You may find the CUDA samples on dynamic parallelism for more detail. They contain both command line options and project settings for all supported OS.
http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-