It is possible to use nvprof to access/read bank conflicts counters for CUDA exec:
nvprof --events shared_st_bank_conflict,shared_ld_bank_conflict my_cuda_exe
However it does not work for the code that uses OpenCL rather then CUDA code.
- Is there any way to extract these counters outside
nvprof
from OpenCL environment, maybe directly from ptx? - Alternatively is there any way to convert PTX assembly generated from nvidia OpenCL compiler using
clGetProgramInfo
withCL_PROGRAM_BINARIES
to CUDA kernel and run it usingcuModuleLoadDataEx
and thus be able to usenvprof
? - Is there any simulation CPU backend that allows to set such parameters as bank size etc?
Additional option:
- Use converter of opencl to cuda code inlcuding features missing from CUDA like vloadn/vstoren, float16, and other various accessors.
#define
work only for simple kernels. Is there any tool that provides it?
No. Nor is there in CUDA, nor in compute shaders in OpenGL, DirectX or Vulkan.
No. OpenCL PTX and CUDA PTX are not the same and can't be used interchangeably
Not that I am aware of.