Why would clEnqueueNDRangeKernel crash with a floating-point exception?

178 Views Asked by At

I'm trying to launch a certain kernel using clEnqueueNDRangeKernel(), from within a C++ program. But instead of enqueuing, or returning an error, it gets a floating-point exception signal (SIGFPE).

For IP reasons which I can't go into, it's difficult for me to provide an example triggering this signal. But - there doesn't seem to be any legitimate reason for this to occur. Are there known cases of that function itself actually performing an invalid floating-point operation?

1

There are 1 best solutions below

0
On

tl;dr: It's a divide-by zero due to a problem with your dimensions.

With NVIDIA CUDA's OpenCL library (at least with v11.2.152), passing workgroup dimensions and overall grid dimensions of different dimensionality may indeed trigger such a situation. OpenCL users have reported similar behavior in the past.

NVIDIA has deplorably chosen to provide its libraries as closed-source only, so I can only speculate about the reason, but it seems to be the following: When you construct a cl::NDRange with two dimensions, the third value in the international representation is initialized to 0 (explicitly and necessarily, or just sometimes). Now, if you read the documentation for clEnqueueNDRangeKernel() carefully, you'll notice that the both the global dimensions and the local dimensions must have the same dimensionality, i.e. same number of dimensions; clEnqueueNDRangeKernel() probably assumes this is the case, and calculates the number of grid blocks (number of workgroups) in the third dimension using global_dims_it_got[i] / local_dims_it_got[i] for every dimension of the global dimensions it received. Thus when the global dimensionality is higher, it ends up dividing by 0. That triggers SIGFPE, which, despite its name, is not only used for invalid floating-point operations, but rather any arithmetic error.