CUDA Fermi's Architecture: Memory structure

228 Views Asked by Andrea Sylar Solla At 16 August 2025 at 22:59

I've a question about the CUDA Fermi's Architecture: I've read somewhere that in Fermi's architecture the global memory's access is fast like the shared memory just because now they use uniform addressing.

So it's true that I can access to data on the global memory with no (big) latency (unlike the "pre-Fermi" GPU)?

It's very important for me to know that just because I'm programming code for an Nvidia Tesla GPU without have it (it's in the University's lab, and I can't access it during the summer...)

Original Q&A

There are 1 best solutions below

CygnusX1 On 11 August 2012 at 22:47

This is not true. Global memory access on Fermi is relatively long when compared to shared memory access. However, due to caches, you may directly hit a cach reducing the latency. This is particularly useful in less-than-ideal memory access patterns (e.g. slightly misaligned access).

Uniform memory addressing is a completely different thing, unrelated to the above. Uniform memory addressing allows the GPU to deduct at runtime if given memory pointer is refering to global or shared (or even mapped-pinned-host, or other-GPU) memory. On pre-Fermi cards the type of memory had to be deducible at compile time.

CUDA Fermi's Architecture: Memory structure

There are 1 best solutions below

Related Questions in CUDA

Related Questions in GLOBAL

Related Questions in SHARED

Related Questions in TESLA

Trending Questions

Popular # Hahtags

Popular Questions