CUDA Fermi's Architecture: Memory structure

231 Views Asked by At

I've a question about the CUDA Fermi's Architecture: I've read somewhere that in Fermi's architecture the global memory's access is fast like the shared memory just because now they use uniform addressing.

So it's true that I can access to data on the global memory with no (big) latency (unlike the "pre-Fermi" GPU)?

It's very important for me to know that just because I'm programming code for an Nvidia Tesla GPU without have it (it's in the University's lab, and I can't access it during the summer...)

1

There are 1 best solutions below

0
On

This is not true. Global memory access on Fermi is relatively long when compared to shared memory access. However, due to caches, you may directly hit a cach reducing the latency. This is particularly useful in less-than-ideal memory access patterns (e.g. slightly misaligned access).

Uniform memory addressing is a completely different thing, unrelated to the above. Uniform memory addressing allows the GPU to deduct at runtime if given memory pointer is refering to global or shared (or even mapped-pinned-host, or other-GPU) memory. On pre-Fermi cards the type of memory had to be deducible at compile time.