I'm running nvidia's sample code. The Code can be found on github: I know from the CUDA toolkit documentation that the code
Allows the library to use Tensor Core operations whenever possible
by using CUBLAS_TENSOR_OP_MATH
.
In the documentation I've found that CUBLAS_DEFAULT_MATH
prevents the library from using Tensor Core operations, HOWEVER the sample code doesn't use that.
What is the default setting for WMMA? Will it be executed on CUDA cores or is there a POSSIBILTY that Tensor cores might support computing.
wmma instructions can only use (ie. execute on) Tensor Core hardware. They cannot execute on any other type of hardware. For this reason, when compiling CUDA device code with wmma instructions, you must target an architecture (cc7.x, currently) that has Tensor Core hardware. Furthermore, such code will only run correctly on a cc7.x device (currently).
The CUBLAS variables you refer to affect usage of the CUBLAS API. They have no connection to wmma instructions that you code yourself.
Underneath the hood, the CUBLAS library has multiple code paths. The CUBLAS variables you refer to may affect code path decisions that the CUBLAS library may make. Some of those code paths may use wmma instructions or equivalent Tensor Core usage. Other code paths may perform the same operation (at a high level, e.g. matrix-matrix multiply) without using wmma instructions.