Why the distinction between WMMA and "just" MMA instructions?

48 Views Asked by At

After reading the answer to this question:

Does PTX (8.4) not cover smaller-shape WMMA instructions?

and re-reading the section of the PTX ISA reference distinguishing WMMA from MMA instructions, I wonder - why the distinction?

That is,

  • Why do some of the instructions get the w prefix? After all, some of the non-w MMA operations are warp-wide...
  • Why don't we just have mma.load and mma.store warp-wide instructions which can take care of the loading data into registers?
  • Why is there no coverge by intrinsics and templates (e.g. fragment<...>) of all of the matrix-multiply-add-related PTX instructions?
0

There are 0 best solutions below