CUDA programming - L1 and L2 caches

2.7k Views Asked by Saman I At 16 April 2012 at 20:10

Could you please explain the differences between using both "L1 and L2" caches or "only L2" cache in CUDA programming? What should I expect in time execution? When could I expect smaller gpu time? When I enable both L1 and L2 caches or just enable L2? thanks

Original Q&A

There are 1 best solutions below

Tom On 16 April 2012 at 21:50 BEST ANSWER

Typically you would leave both L1 and L2 caches enabled. You should try to coalesce your memory accesses as much as possible, i.e. threads within a warp should access data within the same 128B segment as much as possible (see the CUDA Programming Guide for more info on this topic).

Some programs are unable to be optimised in this manner, their memory accesses are completely random for example. For those cases it may be beneficial to bypass the L1 cache, thereby avoiding loading an entire 128B line when you only want, for example, 4 bytes (you'll still load 32B since that is the minimum). Clearly there is an efficiency gain: 4 useful bytes from 128 is improved to 4 from 32.

CUDA programming - L1 and L2 caches

There are 1 best solutions below

Related Questions in CUDA

Related Questions in COALESCING

Trending Questions

Popular # Hahtags

Popular Questions