Questions of resident warps of CUDA

444 Views Asked by At

I have been using CUDA for a month, now i'm trying to make it clear that how many warps/blocks are needed to hide the latency of memory accesses. I think it is related to the maximum of resident warps on a multiprocessor.

According to Table.13 in CUDA_C_Programming_Guide (v-7.5),the maximum of resident warps per multiprocessor is 64. Then, my question is : what is the resident warp? is it refer to those warps with the data read from memory of GPUs and are ready to be processed by SPs? Or refer to either the warps that can read momory for datar or warps that are ready to be processed by SPs,which means that the rest warps except those 64 can neither read memory nor be processed by SPs untill some of those 64 resident warps are done.

1

There are 1 best solutions below

5
On

The maximum amount of resident warp is the maximum number of warps that can be processed in parallel on the multiprocessor. A warp is active when it is scheduled by warp scheduler and registers have been allocated.

If you achieve to have this amount of warps running in parallel, this the theoretical maximum occupancy (100%, or 1:1). If not, the occupancy ratio is lower.

Other warps will have to wait.

Might be related to this question on SO.


Edited answer for further questions :

  1. Warps

About the maximum amount of warps that can be processed : the SM (streaming multi-processors) have a maximum of processing cores, and the GPU has a limited amount of SMs. Even if this webinar is not up-to-date with new architectures, it gives some good examples :

SM – Streaming multi-processors with multiple processing cores

Each SM contains 32 processing cores

Execute in a Single Instruction Multiple Thread (SIMT) fashion

Up to 16 SMs on a card for a maximum of 512compute cores

And :

Fermi can have up to 48 active warps per SM (1536 threads)

  1. Processing warps

First, for some terms they are not always clearly official, see for example this topic from Nvidia DevTalk.

As explained on this topic, a given warp is active once it has been allocated on the SM with its resources. Then it can be :

  • eligible : it can issue an operation
  • stalled : it cannot because of a resource/data dependency

This is possible because we have a SIMT architecture there, meaning Single Instruction Multiple Threads. You will find lots of readings on this topic that can be very useful if you plan on tweaking occupancy.