CUDA dynamic parallelism with Driver API

956 Views Asked by At

I'm trying to compile and link a dynamic kernel and use it with the CUDA driver API on a GK110.

I compile the .cu source file in Visual Studio with the relocatable device code flag and compute_35, sm_35 into a ptx file and then the CUDA linker adds cudadevrt.lib (at least it tried to according to the linker invocation). When I do a cuModuleLoad on the ptx .obj it says unsupported device code. There is a also a .device-link.obj which seems unrealistically small and non of the driver api functions seem to recognize it as a valid image. When inspecting the ptx file I can see that it generated a call to the kernel launch function according to the CUDA documentation (dynamic parallelism from PTX section).

How can I link the proper device code such that the dynamic kernel invocation works?

(this is CUDA 6.5 on Win64 with VC2013)

1

There are 1 best solutions below

2
On BEST ANSWER

You need to do the linking while loading the ptx-file using cuda linker provided by the driver API:

  • Compile the cu-source file with relocatable flag to ptx

In your app:

  • Create a linker instance with cuLinkCreate()
  • Append the ptx-file using cuLinkAddFile() or cuLinkAddData()
  • Append cudadevrt.lib using cuLinkAddFile() or cuLinkAddData()
  • Call cuLinkComplete() which returns you the binary you can then load as usual (e.g. cuModuleLoadDataEx())
  • Destroy the linker instance with cuLinkDestroy()