I have tried below program using atomicInc().
__global__ void ker(int *count)
{
int n=1;
int x = atomicInc ((unsigned int *)&count[0],n);
CUPRINTF("In kernel count is %d\n",count[0]);
}
int main()
{
int hitCount[1];
int *hitCount_d;
hitCount[0]=1;
cudaMalloc((void **)&hitCount_d,1*sizeof(int));
cudaMemcpy(&hitCount_d[0],&hitCount[0],1*sizeof(int),cudaMemcpyHostToDevice);
ker<<<1,4>>>(hitCount_d);
cudaMemcpy(&hitCount[0],&hitCount_d[0],1*sizeof(int),cudaMemcpyDeviceToHost);
printf("count is %d\n",hitCount[0]);
return 0;
}
Output is:
In kernel count is 1
In kernel count is 1
In kernel count is 1
In kernel count is 1
count is 1
I'm not understanding why it is not incrementing. Can anyone help
Referring to the documentation,
atomicInc
does this:for the following:
compute:
and store the result back in
count[0]
(If you're not sure what the
?
operator does, look here)Since you've passed
n
= 1, andcount[0]
starts out at 1,atomicInc
never actually increments the variablecount[0]
beyond 1.If you want to see it increment beyond 1, pass a larger value for
n
.The variable
n
actually acts as a "rollover value" for the incrementing process. When the variable to be incremented actually reaches the value ofn
, the nextatomicInc
will reset it to zero.Although you haven't asked the question, you might ask, "Why do I never see a value of zero, if I am hitting the rollover value?"
To answer this, you must remember that all 4 of your threads are executing in lockstep. All 4 of them execute the
atomicInc
instruction before any execute the subsequent print statement.Therefore we have a variable of
count[0]
which starts out at 1.Then all 4 threads print out the value.
As another experiment, try launching 5 threads instead of 4, see if you can predict what the value printed out will be.
As @talonmies indicated in the comments, if you swap your
atomicInc
for anatomicAdd
:You'll get results that you were probably expecting.