Is local memory access coalesced?

895 Views Asked by AdelNick At 06 September 2011 at 07:09

Suppose, I declare a local variable in a CUDA kernel function for each thread:

float f = ...; // some calculations here

Suppose also, that the declared variable was placed by a compiler to a local memory (which is the same as global one except it is visible for one thread only as far as I know). My question is will the access to f be coalesced when reading it?

Original Q&A

There are 2 best solutions below

talonmies On 06 September 2011 at 08:47 BEST ANSWER

I don't believe there is official documentation of how local memory (or stack on Fermi) is laid out in memory, but I am pretty certain that mulitprocessor allocations are accessed in a "striped" fashion so that non-diverging threads in the same warp will get coalesced access to local memory. On Fermi, local memory is also cached using the same L1/L2 access mechanism as global memory.

John Gordon On 22 September 2011 at 13:15

CUDA cards don't have memory allocated for local variables. All local variables are stored in registers. Complex kernels with lots of variables reduce the number of threads that can run concurrently, a condition known as low occupancy.

Is local memory access coalesced?

There are 2 best solutions below

Related Questions in CUDA

Related Questions in GPU-LOCAL-MEMORY

Trending Questions

Popular # Hahtags

Popular Questions