TL;DR: Using native_log2() produces non deterministic behavior in OpenCL kernel, while using log2() produces deterministic behavior. Why is this happening?
So I have this function below acting as a helper function for an OpenCL kernel, and I was using the native_ version of log2 (native_log2) to improve speed performance.
When I was comparing the results produced by the kernel and by the original program, I realized that in most of the cases the kernel is producing the right values, however, sometimes it produces an incorrect value (like 30 incorrect values in 500k function calls). VERY IMPORTANT: The errors are not always on the same computations. I am processing multiple input files, and the errors seem to occur randomly in different sets of files with different runs. That is, the results are non deterministic.
After some tests I narrowed the problem to the function below and found out that swapping the native_log2 by log2 produces the correct value 100% of the times. All those typecasts look ugly, but the log2() and floor() functions are only compatible with double/float, while my input/output must be integers.
My device is a NVIDIA GPU 940MX and only supports OpenCL 1.2. The OpenCL 1.2 documentation states that
A subset of functions from table 6.8 that are defined with the native_ prefix. These functions may map to one or more native device instructions and will typically have better performance compared to the corresponding functions (without the native__ prefix) described in table 6.8. The accuracy (and in some cases the input range(s)) of these functions is implementation-defined.
Clearly I am supposed to expect some errors when using native_ functions, but the documentation is not clear about the determinism of the errors I may be encountering.
Can someone give me directions on why I am facing this strange behavior?
int xGetExpGolombNumberOfBits(int value){
unsigned int uiLength2 = 1;
unsigned int uiTemp2 = select((unsigned int)( value << 1 ), ( (unsigned int)( -value ) << 1 ) + 1, value <= 0);
// These magic numbers (7 and 128) are substituting two constants for the sake of clarity
while( uiTemp2 > 128 )
{
uiLength2 += ( 7 << 1 );
uiTemp2 >>= 7;
}
return uiLength2 + (((int)floor(native_log2((float)uiTemp2))) << 1);
}