Throughput calculation in OpenCl

145 Views Asked by At

I am trying to calculate the throughput of my kernel which is written in my openCL. But I am not sure how to do that, I have tried to find some file generated after compilation which shows throughput as 0.435(" found in the .attrb file") but not sure what does that mean. Is there any other way to find throughput?

2

There are 2 best solutions below

0
On

This is a very vague question.

Do you mean only the kernel without loading the data?

What does the kernel going do, on what kind of hardware are you running it, how is your data organized, how do you manage your buffers?

Is everything in global memory? Are you defining latencies also? Do you need to maximaze the throughput? Are you going to optimize for specific hardware?

For me many questions rise.

0
On

Throughput of kernel in OpenCL calculated as:

(NumReadBytes + NumWriteBytes)/ElapsedTime

For measuring time use cl_event.

double getDuration(cl_event event)                                                                                                                                                                                                                                                        
{                                                                                                                                                                                                                                                                                         
  cl_ulong start_time, end_time;                                                                                                                                                                                                                                                        
  clGetEventProfilingInfo (event,CL_PROFILING_COMMAND_START, 
                                 sizeof(cl_ulong), &start_time,NULL);                                                                                                                                                                                           
  clGetEventProfilingInfo (event,CL_PROFILING_COMMAND_END, 
                                     sizeof(cl_ulong), &end_time,NULL);                                                                                                                                                                                                                                                                              
  double total_time = (end_time - start_time) * 1e-6;                                                                                                                                                                                                                                     
  return total_time;                                                                                                                                                                                                                                                                      
}          

cl_event timer;

int ret = clEnqueueNDRangeKernel(cq, kernel, 1, p_global_work_offset, &global_work_size, 
                                          &local_work_size, 0, NULL, &timer); 

printf("T:%zu L:%zu T:%fms",global_work_size, local_work_size, getDuration(timer));