Cuda __device__ member function with explicit template declaration

38 Views Asked by At

I have a templated struct and I want to speed compilation times by separating declaration and implementation, using explicit template declaration (I'm using Kokkos 4.1, with Cuda backend, GCC 11.3 and Cuda 11.8).

I have a struct that looks like this:

template<OrbitType orbit_t, bool relativistic, typename reader_t, typename T = typename reader_t::T, typename Space = typename reader_t::Space>
struct Orbit {
    MetricsAndFields<reader_t> fields;
    Parameters params;

    Orbit() = default;
    Orbit(MetricsAndFields<reader_t> fields, Parameters params) : fields(fields), params(params) {}
    
    __host__ __device__  T eom_denominator(const Particle<T>& p);
};

Now in my .cpp file, I explicitely declare the struct as needed:

template struct Orbit<OrbitType::TYPE_I, false, MyReader<double, Kokkos::HostSpace>, double, Kokkos::HostSpace>;
// #ifdef ENABLE_GPU
template struct Orbit<OrbitType::TYPE_I, false, MyReader<double, Kokkos::CudaSpace>, double, Kokkos::CudaSpace>;

and obviously implement the function:

template<OrbitType orbit_t, bool relativistic, typename reader_t, typename T, typename Space>
__host__ __device__ T Orbit<orbit_t, relativistic, reader_t, T, Space>::eom_denominator(const Particle<T>& p) {
    // Implementation
}

When compiling, I get the following linker error: ptxas fatal : Unresolved extern function '_ZN5OrbitIL9OrbitType1ELb0E8MyReaderIdN6Kokkos9CudaSpaceEEdS3_E15eom_denominatorERK8ParticleIdE' which means that it doesn't find my templated member function.

When I drop the device qualifier of my function, then it compiles just fine (by warning that I'm not allowed to call host function from device code). So, the problem comes from __device__. I tried to use __noinline__, but I still get the linker error.

Is there any way around that? Or am I forced to have slow compilation times?

0

There are 0 best solutions below