CUDA dynamic parallelism with Driver API

983 Views Asked by FHoenig At 07 January 2015 at 22:35

I'm trying to compile and link a dynamic kernel and use it with the CUDA driver API on a GK110.

I compile the .cu source file in Visual Studio with the relocatable device code flag and compute_35, sm_35 into a ptx file and then the CUDA linker adds cudadevrt.lib (at least it tried to according to the linker invocation). When I do a cuModuleLoad on the ptx .obj it says unsupported device code. There is a also a .device-link.obj which seems unrealistically small and non of the driver api functions seem to recognize it as a valid image. When inspecting the ptx file I can see that it generated a call to the kernel launch function according to the CUDA documentation (dynamic parallelism from PTX section).

How can I link the proper device code such that the dynamic kernel invocation works?

(this is CUDA 6.5 on Win64 with VC2013)

Original Q&A

There are 1 best solutions below

kunzmi On 08 January 2015 at 00:52 BEST ANSWER

You need to do the linking while loading the ptx-file using cuda linker provided by the driver API:

Compile the cu-source file with relocatable flag to ptx

In your app:

Create a linker instance with cuLinkCreate()
Append the ptx-file using cuLinkAddFile() or cuLinkAddData()
Append cudadevrt.lib using cuLinkAddFile() or cuLinkAddData()
Call cuLinkComplete() which returns you the binary you can then load as usual (e.g. cuModuleLoadDataEx())
Destroy the linker instance with cuLinkDestroy()

CUDA dynamic parallelism with Driver API

There are 1 best solutions below

Related Questions in CUDA

Related Questions in DYNAMIC-PARALLELISM

Trending Questions

Popular # Hahtags

Popular Questions