CUDA recursion depth

1.7k Views Asked by At

When using Dynamic Parallelism in CUDA, you can implement recursive algorithms like mergeSort. I have implemented it and my program don't work for inputs greater than blah.
My question is how many depth in the recursion tree the implementation can go? Is there any limitation? (My program is just fine for smaller inputs.)

1

There are 1 best solutions below

2
On BEST ANSWER

From Professional CUDA C Programming:

The maximum nesting depth of dynamic parallelism is limited to 24, but in reality most kernels will be limited by the amount of memory required by the device runtime system at each new level . . .