Maximum number of concurrent kernels & virtual code architecture

1.8k Views Asked by At

So I found this wikipedia resource

Maximum number of resident grids per device (Concurrent Kernel Execution)

and for each compute capability it says a number of concurrent kernels, which I assume to be the maximum number of concurrent kernels.

Now I am getting a GTX 1060 delivered which according to this nvidia CUDA resource has a compute capability of 6.1. From what I have learned about CUDA so far you can specify the virtual compute capability of your code at compile time in NVCC though with the flag -arch=compute_XX.

So will my GPU be hardware constrained to 32 concurrent kernels or is it capable of 128 with the -arch=compute_60 flag?

2

There are 2 best solutions below

4
On BEST ANSWER

According to table 13 in the NVIDIA CUDA programming guide compute capability 6.1 devices have a maximum of 32 resident grids = 32 concurrent kernels.

Even if you use the -arch=compute_60 flag, you will be limited to the hardware limit of 32 concurrent kernels. Choosing particular architectures to compile for does not allow you to exceed the hardware limits of the machine.

0
On

Adding to the accepted answer, it is now Table 15 in the NVIDIA CUDA C Programming Guide as of April 2022, with the latest CUDA version being 12.1. Or, you can just search Technical Specifications per Compute Capability in the docs.