Can nvlink inline device functions from separate compilation units?

139 Views Asked by At

If the separate compilation units that are fed as input to nvlink contain cuda kernels and device functions that invoke device functions marked as __forceinline__, will these functions be inlined? Assume they would be inlined if one put all the source code into a single file.

1

There are 1 best solutions below

0
On

If the separate compilation units that are fed as input to nvlink contain cuda kernels and device functions that invoke device functions marked as __forceinline__, will these functions be inlined?

To the best of my knowledge, the CUDA device code linker can't do this. The __forceinline__ directive is a compiler level operation, and after compilation there is no way of marking code as inlineable in either PTX or SASS. The CUDA device code compiler should emit a warning that an external inline function was used but not defined if you try this.

If you want functions to be compiled inline, you have to (unsurprisingly) use a compiler, not a linker.