I was reading this presentation document: http://on-demand.gputechconf.com/gtc-express/2011/presentations/register_spilling.pdf
In page 3 of the presentation, the author states:
A store always happens before a load –Only GPU threads can access LMEM addresses
Can anybody explain to me why? Does he mean when the local memory is first initialised?
In this respect, local memory is something like shared memory.
In order to do anything useful with shared memory, you have to initialize (store something) first. The same is true for Local memory.
Only CUDA thread code can access local memory. There are no CUDA API calls like
cudaMemcpy
that can access local memory. It is not possible to initialize local memory from host code.The same comments are basically true for shared memory.