Is there a list of headers that can be used in an string to compile with NVRTC?

128 Views Asked by At

(Using NVRTC run-time compiler)

There is a string of CUDA function:

R"(
        extern "C" __global__ void test1(float * a, float * b, float *c)
        {
            int id= blockIdx.x * blockDim.x + threadIdx.x;
            c[id]=a[id]+b[id];
        }
)"

that is successfully compiled by driver API into ptx code and used in program to compute c=a+b.

But when I try some header to include an algorithm

R"(
        #include <climits>

        extern "C" __global__ void test1(float * a, float * b, float *c, int * gpuOffset)
        {
            int id=blockIdx.x * blockDim.x + threadIdx.x;
            device_vector<int> dv;
            c[id]=a[id]+b[id];
        }


)"

it returns an error saying

test1.cu(23): catastrophic error: cannot open source file "climits"

1 catastrophic error detected in the compilation of "test1.cu".
Compilation terminated.

or

test1.cu(28): error: identifier "device_vector" is undefined

depending on the include or a header's class (such as device_vector).

Also Documentation shows that both cuFFT and thrust is usable only on host side and it seems I can't use any "partial" algorithm that I wanted to use on each thread-block independently.

Is there a list of headers for some cuda-supported algorithms to be used as per-block:

R"(
        #include "driver_api_fft.h"
        #include "driver_api_ifft.h"
        extern "C" __global__ void test1(float * a, float * b, float *c)
        {
            int id=blockIdx.x * blockDim.x + threadIdx.x;
            fft(a,id,1024);
            ifft(b,id,1024);
            c[id]=a[id]+b[id];
        }
)"

to successfully compile and run on any target machine or is it possible to link those algorithm libraries(thrust for device_vector) to ptx linker from host-side so that I can use them, somehow, from compiled kernel? If these are not possible, then do I need to write a Fourier-Transform myself and make it "fast" by implementing algorithms myself?

0

There are 0 best solutions below