Is there a way to check at runtime for which CUDA compute capabilites the current program was compiled? Or do the arch=compute_xx,code=sm_xx
flags set any defines which could be checked?
Background is that I cannot make sure that users have a "correct" setup for a deployed binary. For that, I'd like to compare their device's CC (using cudaGetDeviceProperties
) versus the compiled CC before starting lengthy initialization. Up until now, I can only try to launch a kernel and use cudaPeekAtLastError
to exit when no kernel image is available for execution on the device
.
EDIT: I've tried using cudaFuncGetAttributes
like so (main.cu
):
#include <iostream>
__global__ void cudaKernel()
{
;
}
int main()
{
cudaFuncAttributes attr;
cudaError_t err = cudaFuncGetAttributes(&attr, cudaKernel);
if (err != cudaSuccess) {
std::cout << "CUDA Error: " << cudaGetErrorString(err);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
This works fine on a linux machine with CUDA 11.2 and a GTX 1080 (CC 6.1). However, running on my local Windows machine (CUDA 11.8, GT730 with CC 3.5) this returns: CUDA Error: invalid device function
, which indicates
The requested device function does not exist or is not compiled for the proper device architecture.
Using cuobjdump
on the exe tells me:
Fatbin ptx code:
================
arch = sm_35
code version = [7,8]
host = windows
compile_size = 64bit
compressed
Fatbin elf code:
================
arch = sm_35
code version = [1,7]
host = windows
compile_size = 64bit
I'm scratching my head on why cudumpobj
is able to determine the arch whereas cudaFuncGetAttributes
fails even though it should be able to execute that on my CC 3.5 device. I'm running on driver version 456.71
, which should work with CUDA 11.x.
The only way is what you are already doing, i.e. try to examine the function using
cudaFuncGetAttributes
, and catch the error if the function doesn't can't load because of an architecture mismatch or lack of JIT path to execute.The underlying reason is that the only way the runtime or driver API can examine the payload of an object or cubin file is to try and load it into the context. And the act of loading is what fails when you have an incompatible architecture. It isn't that
cudaFuncGetAttributes
doesn't work, it is that the API is failing trying to load a cubin file which can't be loaded into your context because of an architecture mismatch.cuobjdump
, on the other hand, is showing you the ELF headers in the files, using an ELF parsing library (NVIDIA's own, which I don't believe is public, but I could be wrong). If I remember correctly, there is a proprietary ELF section callednvinfo
which contains all the CUDA specific metadata. It doesn't use the driver or runtime API. If you were really desperate to do this, you might be able to use some kind of ELF parsing library to extract that metadata and replicate the functionality yourself, although that would be a lot of reverse engineering to do well.