I'm building an OpenCL program - using NVIDIA CUDA 11.2's OpenCL library (and its C++ bindings). After invoking cl::Program::build()
successfully, for a single device (passing a vector with a single device index), I obtain the generated "binaries" sizes using: built_program.getInfo<CL_PROGRAM_BINARY_SIZES>()
, which also succeeds, but gives me 3 values: A non-zero value and two zeros. When I print the first binary, I see the PTX code I expect.
My question: Why am I given two (empty) extra binaries?
Even though the program is built for specific devices you specify (see documentation for clBuildProgram), the binaries are made available for each device in the context. In your case, you probably have three GPUs on your system; you built the program for a single device, so for one of the three devices, you see a non-empty PTX.
Confusing? Sure. Convoluted? Yes. But is it entirely senseless? Admittedly, not really.
Digging around a bit further, it seems this is even officially documented (emphasis mine):
Not every device for which you built, but every device associated with the program; which is probably every device in the OpenCL context with which you created the program.