I have tried below program using atomicInc().
__global__ void ker(int *count)
{
int n=1;
int x = atomicInc ((unsigned int *)&count[0],n);
CUPRINTF("In kernel count is %d\n",count[0]);
}
int main()
{
int hitCount[1];
int *hitCount_d;
hitCount[0]=1;
cudaMalloc((void **)&hitCount_d,1*sizeof(int));
cudaMemcpy(&hitCount_d[0],&hitCount[0],1*sizeof(int),cudaMemcpyHostToDevice);
ker<<<1,4>>>(hitCount_d);
cudaMemcpy(&hitCount[0],&hitCount_d[0],1*sizeof(int),cudaMemcpyDeviceToHost);
printf("count is %d\n",hitCount[0]);
return 0;
}
Output is:
In kernel count is 1
In kernel count is 1
In kernel count is 1
In kernel count is 1
count is 1
I'm not understanding why it is not incrementing. Can anyone help
Referring to the documentation,
atomicIncdoes this:for the following:
compute:
and store the result back in
count[0](If you're not sure what the
?operator does, look here)Since you've passed
n= 1, andcount[0]starts out at 1,atomicIncnever actually increments the variablecount[0]beyond 1.If you want to see it increment beyond 1, pass a larger value for
n.The variable
nactually acts as a "rollover value" for the incrementing process. When the variable to be incremented actually reaches the value ofn, the nextatomicIncwill reset it to zero.Although you haven't asked the question, you might ask, "Why do I never see a value of zero, if I am hitting the rollover value?"
To answer this, you must remember that all 4 of your threads are executing in lockstep. All 4 of them execute the
atomicIncinstruction before any execute the subsequent print statement.Therefore we have a variable of
count[0]which starts out at 1.Then all 4 threads print out the value.
As another experiment, try launching 5 threads instead of 4, see if you can predict what the value printed out will be.
As @talonmies indicated in the comments, if you swap your
atomicIncfor anatomicAdd:You'll get results that you were probably expecting.