If each piece of data take 128 bit or more, is there any advantage of grouping them in memory?

166 Views Asked by subb At 07 August 2013 at 01:07

I've read in the CUDA Programming Guide that the global memory in a CUDA device is accessed by transaction on 32, 64 or 128 bit. Knowing that, is there any advantage of, say, having an set of float4 (128 bit) close together in memory? As I understand it, whether the float4 are distributed in memory or in a sequence, the number of transaction will be the same. Or will all access be coalesced in one gigantic transaction?

Original Q&A

There are 1 best solutions below

Robert Crovella On 07 August 2013 at 02:11 BEST ANSWER

Coalescing refers to combining memory requests from individual threads in a warp into a single memory transaction.

A single memory transaction is typically a 128 byte cache line, therefore it would consist of eight 128 bit (e.g. float4) quantities.

So, yes, there is a benefit to having multiple threads requesting adjacent 128 bit quantities, because these can still be coalesced into a single (128 byte) cache line request to memory.

If each piece of data take 128 bit or more, is there any advantage of grouping them in memory?

There are 1 best solutions below

Related Questions in MEMORY

Related Questions in CUDA

Related Questions in COALESCING

Trending Questions

Popular # Hahtags

Popular Questions