I am developing an application making use of OpenCL, targeted to 1.2 versión. I use DX11 interoperability to display the kernel results. I try my code in Intel (iGPU) and Nvidia platforms, in both I recon the same behaviour.
My call to clEnqueueNDRangeKernel is blocking the CPU thread. I have checked the documentation and I can not find an statement declaring the situations in which a kernel call may block. I have read in some forums that those things happens sometimes with some OpenCL implementations. The code is working properly and outputting valid results. The API does not return any error at any given point, all seems smooth.
I can not paste the full source but I will paste the in-loop part:
size_t local = 64;
size_t global = ctx->dec_in_host->horizontal_blocks * ctx->dec_in_host->vertical_blocks * local;
print_if_error(clEnqueueWriteBuffer(ctx->queue, ctx->blocks_gpu, CL_TRUE, 0, sizeof(block_input) * TOTAL_BLOCKS, ctx->blocks_host, 0, NULL, &ctx->blocks_copy_status), "copying data");
print_if_error(clEnqueueWriteBuffer(ctx->queue, ctx->dec_in_gpu, CL_TRUE, 0, sizeof(decoder_input), ctx->dec_in_host, 0, NULL, &ctx->frame_copy_status), "copying data");
if (ctx->mode == nv_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueAcquireD3D11ObjectsNV(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Adquring texture");
else if (ctx->mode == khr_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueAcquireD3D11ObjectsKHR(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Adquring texture");
t1 = clock();
print_if_error(clEnqueueNDRangeKernel(ctx->queue, ctx->kernel, 1, NULL, &global, &local, 0, NULL, &ctx->kernel_status), "kernel launch");
t2 = clock();
if (ctx->mode == nv_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueReleaseD3D11ObjectsNV(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Releasing texture");
else if (ctx->mode == khr_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueReleaseD3D11ObjectsKHR(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Releasing texture");
printf("Elapsed time %lf ms\n", (double)(t2 - t1)*1000 / CLOCKS_PER_SEC);
So my question is:
- ¿Do you know any reason why the clEnqueueNDRangeKernel would block?
- ¿Do you know if the Dx11 interop might cause this?
- ¿Do you know if some OpenCL configuration can create a syncronous kernel launch?
Thank you :)
EDIT 1: Thanks to doqtor comment I realize that commenting out parts of the kernel the kernel launch becomes asyncronous. The result is not Ok but I have some hint to work out the answer.