I need to know something about CUDA shared memory. Let's say I assign 50 blocks with 10 threads per block in a G80 card. Each SM processor of a G80 can handle 8 blocks simultaneously. Assume that, after doing some calculations, the shared memory is fully occupied.
What will be the values in shared memory when the next 8 new blocks arrive? Will the previous values reside there? Or will the previous values be copied to global memory and the shared memory refreshed for next 8 blocks?
It states about the type qualifiers:
__device__ __shared__
type variable in shared memory for a block, only stays in kernel__device__
type variable in global memory for a grid, stays until the application exits__device__ __constant__
type variable for a grid, stays until the application exitsthus from this reference, the answer to your question is the memory should be refreshed for the next 8 blocks if they reside in shared memory of your device.