Especially turing and ampere architecture,In the same sm and same warp scheduler,Can the warps run ld/st and other arithmetic instruction simultaneously?

I want to know about how warp scheduler work

1

There are 1 best solutions below

2
On

In the same sm and same warp scheduler,Can the warps run ld/st and other arithmetic instruction simultaneously?

No, not if "simultaneously" means "issued in the same clock cycle".

In current CUDA GPUs including turing and ampere, when the warp scheduler issues an instruction, it issues the same instruction to all threads in the warp, in any given clock cycle.

Different instructions could be run in different clock cycles (of course) and different instructions can be run in the same clock cycle, if those instructions are issued by different warp schedulers in the SM. This would also imply that those instructions are issued to distinct/separate SM units.

So, for example, an integer add instruction issued by warp scheduler 0 would have to be issued to separate functional units compared to a load/store instruction issued by warp scheduler 1 in the same SM. For this example, since the instructions are different, different functional units are needed anyway, and this is self-evident.

But even if both warp schedulers were issuing, for example, FADD (for 2 different warps), they would have to issue to separate floating-point functional units in the SM.

In modern CUDA GPUs, due to the partitioning of the SM, each warp scheduler has its own execution resources (functional units) for at least some instruction types, like FADD. So this would happen anyway, again, for this reason, in this example.