I try to analyze the execution time of my function on device. I read this link: https://docs.oneapi.com/versions/latest/dpcpp/iface/event.html but I did not find in the documentation any information about sycl::info::event_profiling, which let me to understand what they correspond exactly. I mean, the command_start, command_end, command_submit. for example: This is a part of my code, kernel,
auto event = gpuQueue.submit([&](sycl::handler &h) {
//local copy of fun
auto f = fun;
sycl::accessor in_accessor(in_buffer, h, sycl::read_only);
sycl::accessor out_accessor(out_buffer, h, sycl::write_only);
h.parallel_for(n_item, [=](sycl::id<1> index) {
out_accessor[index] = f(in_accessor[index]);
});
});
event.wait();
auto end_overall = std::chrono::system_clock::now();
cl_ulong submit_time = event.template get_profiling_info<
cl::sycl::info::event_profiling::command_submit>();
cl_ulong start_time = event.template get_profiling_info<
cl::sycl::info::event_profiling::command_start>();
cl_ulong end_time = event.template get_profiling_info<
cl::sycl::info::event_profiling::command_end>();
which I want to understand the cl::sycl::info::event_profiling::command_submit, submits the whole code or just submits the parallel-for?
It is a bit clearer on the SYCL 2020 specification:
command_submit
is the timestamp of the command group submission to the SYCL runtime.command_start
is the timestamp of the actual parallel for startingcommand_end
is the timestamp of the parallel for completionSo, your kernel execution time in the device is
command_start
-command_end
, whereas the total processing time for a command group (i.e. with the potential copies, runtime overhead, etc) iscommand_submit
-command_end
.