Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?

351 Views Asked by At

Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b;

1

There are 1 best solutions below

0
On

AMD GCN has no native 64-bit integer vector support, so the second statement would be translated into two 32-bit adds, one V_ADD_U32 followed by a V_ADDC_U32 which takes the carry flag from the first V_ADD_U32 into account.

So to answer your question they are both the same in terms of instruction count, however the first can be computed in parallel (instruction level parallelism) and could be faster IF your kernel is occupancy bound (ie. using lots of registers).

If your statements can be executed by the scalar unit (ie. they do not depend on the thread index) then the game changes and the second one will be just one instruction (vs. two) since the scalar unit has native 64-bit integer support.

However keep in mind your first statement is not the same as the second, you would lose the carry flag.