There are two circular statement, for example:
for(int i=0;i<1000;i++)
for (int j=0;j<1000;j++)
{
for(int k=i*5;k<i*5+5;k++)
for(int l=j*5;l<j*5+5;j++)
{
marrytemp=A[i]+B[j]+marry;
}
marry[i,j]=marrytemp;
}
how can l write it in opencl kernel?
Write the kernel to handle the inner two loops (k,l), then enqueue it as a 2D kernels with global size of i,j.
Edit to add outline of kernel:
The kernel would be something along the lines of:
And then it would be called something like:
Both of these need additional support code (such as creating
command_queue
andkernel
) and have not been compiled. They are just to give you the idea of how to split your four nested loops into an OpenCL kernel.