does valgrind support profiling SYCL applications

77 Views Asked by At

I'm trying to identify valgrind's support for different Programing languages, I just want to find the valgrind's support for the SYCL applications, if supports how to profile the SYCL Application, If not why?

I tried finding the documents related to SYCL profiling and I found that SYCL has its own profiler and also found a blog related to debugging SYCL using Valgrind, but I didn't get the documents related to profiling using Valgrind.

2

There are 2 best solutions below

0
On

No, Valgrind doesn't support any form of partitioned execution.

The component that executes on the CPU should be OK to run in Valgrind. But Valgrind contains no code to instrument the part that runs on GPU/FPGA/DSP. There is also a major conceptual difference between the execution models. On CPUs Valgrind runs with a global lock and behaves as if there is just one CPU whilst GPUs are massively parallel. If you could only use one GPU element at a time I imagine that it would be unfeasibly slow.

0
On

It depends on the SYCL implementation and the backend / target device.

Valgrind is not aware of anything that happens on accelerators. So kernels running on GPU won't work.

However, there are SYCL implementations that support executing kernels on the host as regular C++ code. This is usually called "library-only" implementation, because the SYCL implementation behaves like a regular C++ library in this scenario. In that case, all the usual C++ debugging and profiling tools like gdb or valgrind will work as usual with the entirety of the application, including kernel code. This mode is supported in particular by hipSYCL/Open SYCL.

If you run on GPU, generally the native profiling and debugging tools from that GPU backend will work. For example, if you run your SYCL code through a SYCL implementation with a CUDA backend (such as DPC++ or hipSYCL/Open SYCL), you will be able to use NVIDIA's tools. This is because from the perspective of the tools, the SYCL application looks and behaves just like any CUDA application.

I'm not sure what you mean by "SYCL has its own profiler". SYCL is a standard and as such does not define any debugging or profiling tools. Some SYCL implementations may come with their own tooling.