Why the distinction between WMMA and "just" MMA instructions?

68 Views Asked by einpoklum At 14 March 2024 at 16:22

After reading the answer to this question:

Does PTX (8.4) not cover smaller-shape WMMA instructions?

and re-reading the section of the PTX ISA reference distinguishing WMMA from MMA instructions, I wonder - why the distinction?

That is,

Why do some of the instructions get the w prefix? After all, some of the non-w MMA operations are warp-wide...
Why don't we just have mma.load and mma.store warp-wide instructions which can take care of the loading data into registers?
Why is there no coverge by intrinsics and templates (e.g. fragment<...>) of all of the matrix-multiply-add-related PTX instructions?

There are 0 best solutions below