Tensor cores can be programmatically accessed through the WMMA interface in CUDA (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma and https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/) . Recently, in the Ampere generation of cards, Nvidia announced the ability to perform sparse tensor operations with sparse matrices, as seen here: https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/
The format presented appears to take in pairs of elements and their order within four element segments (2 bit indices). However looking at the wmma documentation I can't find any mention of this, or how to access those special tensor core operations. This is not illuminated by the announcement page of this functionality either AFAICT.
How do I access sparse tensor core functionality in cuda?
The blog post in your question links to the following paper: Accelerating Sparse Deep Neural Networks https://arxiv.org/pdf/2104.08378.pdf
In Section 3.2 it says
Sparse tensor operations can manually be performed using ptx
mma.sp
which is explained in the ptx documentation Section 9.7.13.5 : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-for-sparse-mma