OpenCl equivalent of finding Consecutive indices in CUDA

2.3k Views Asked by At

In CUDA to cover multiple blocks, and thus incerase the range of indices for arrays we do some thing like this:

Host side Code:

 dim3 dimgrid(9,1)// total 9 blocks will be launched    
 dim3 dimBlock(16,1)// each block is having 16 threads  // total no. of threads in  
                   //   the grid is thus 16 x9= 144.        

Device side code

 ...
 ...     
 idx=blockIdx.x*blockDim.x+threadIdx.x;// idx will range from 0 to 143 
 a[idx]=a[idx]*a[idx];
 ...
 ...    

What is the equivalent in OpenCL for acheiving the above case ?

2

There are 2 best solutions below

0
On BEST ANSWER

On the host, when you enqueue your kernel using clEnqueueNDRangeKernel, you have to specify the global and local work size. For instance:

size_t global_work_size[1] = { 144 }; // 16 * 9 == 144
size_t local_work_size[1] = { 16 };
clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL,
                       global_work_size, local_work_size,
                       0, NULL, NULL);

In your kernel, use:

size_t get_global_size(uint dim);
size_t get_global_id(uint dim);
size_t get_local_size(uint dim);
size_t get_local_id(uint dim);

to retrieve the global and local work sizes and indices respectively, where dim is 0 for x, 1 for y and 2 for z.

The equivalent of your idx will thus be simply size_t idx = get_global_id(0);

See the OpenCL Reference Pages.

0
On

Equivalences between CUDA and OpenCL are:

blockIdx.x*blockDim.x+threadIdx.x = get_global_id(0)

LocalSize = blockDim.x

GlobalSize = blockDim.x * gridDim.x