Is there a bug in Cuda? I have run the following code on my GTX580 and r1 is zero at the end. I expect that it is one due to carry propagation? I have tested the code with Cuda Toolkit 4.2.9 and 5.5 and use "nvcc -arch=sm_20 bug.cu -o bug && ./bug" to compile and run it.
#include <stdio.h>
#include <cuda.h>
__global__ void bug()
{
unsigned int r1 = 0;
unsigned int r2 = 0;
asm( "\n\t"
"sub.cc.u32 %0, 0, 1;\n\t"
"addc.cc.u32 %1, 0, 0;\n\t"
: "=r"(r1), "=r"(r2) );
printf("r1 >> %04X\n", r1);
printf("r2 >> %04X\n", r2);
}
int main(void)
{
float *a_d;
cudaMalloc((void **) &a_d, 1);
bug <<< 1,1 >>> ();
cudaFree(a_d);
}
Output
r1 >> FFFFFFFF
r2 >> 0000
I believe you're making some assumptions about the
CC.CFflag referenced in the PTX ISA documentation that may not be valid.Note that the definition of specific states (e.g. 0 or 1) of this bit are never given that I can see. Furthermore, I don't find any mapping between the definition of "carry-in/carry-out" and "borrow-in/borrow-out"
Stated another way, I think you are assuming that a "borrow" status in this flag is identical to a "carry" status. In other words, you are assuming something like:
But such a truth table or mapping is never given. Furthermore the manual states:
I don't think your code satisfies the intent, nor do I think the above assumption of truth table for
CC.CFis valid.In fact what I think is happening is a truth table like this:
(the 0 and 1 here are arbitrary; that is also not defined in the manual.)
All examples of code I have tried (about 6 cases, including yours) have fit the definition I have given above.
Having said this, I would think it unwise to depend on this, as it is mostly undocumented. A safe rule for computer architecture is that undocumented behavior may change in the future.