In the CUDA Programming guide, v11.7, section B.24.6. Element Types & Matrix Sizes, there's a table of supported type combinations, in which the multiplications are either sub-single-precision floating point types, or double
- never `float . But - in section B.24.1 Description, it says that:
The data type, T [for matrix fragments], may be double, float, __half, __nv_bfloat16, char, or unsigned char for multiplicands and double, float, int, or __half for accumulators.
So, can the multiplicand matrices be float
, or can't they?
Probably not, single-precision floating-point multiplicands are not supported.
The PTX ISA guide lists the lower-level WMMA primitives and their different operand combinations, in Section 9.7.13; and, indeed, there are no primitives where single-precision floating-point (
f32
) can be the data type of the multiplicand matrices. The closest we can get istf32
for the multiplicands, andf32
for the addend and the result. Now, if PTX doesn't have the primitives we're after, it is all but impossible that they exist on the actual micro-architectures and are simply not exposed (and the compiler will not be able to optimize other PTX code into fp32 WMMA).Note that double-precision multiplicands are supported (although YMMV when it comes to their speed).
edit: This answer may need some qualification. A 2020 NVIDIA blog post about TF32 says:
So, if you can pretend your multiplicands are TF32 (and I'm not sure you actually can), then, in a sense, single-precision floating-point is usable, though the accuracy may be lower than expected.