Does nvcc support tail call optimization in dynamic parallelism?

217 Views Asked by peteraldaron At 29 June 2025 at 02:11

Under the CUDA Programming Guide section C.4.3.1.2. "Nesting and Synchronization Depth", it is mentioned:

"An optimization is permitted where the system detects that it need not reserve space for the parent's state in cases where the parent kernel never calls cudaDeviceSynchronize(). In this case, because explicit parent/child synchronization never occurs, the memory footprint required for a program will be much less than the conservative maximum. Such a program could specify a shallower maximum synchronization depth to avoid over-allocation of backing store"

Does this mean that the compiler supports tail recursion under dynamic parallelism? For example, if I have a kernel that recursively calls itself:

__global__ void kernel(int layer){
  if(layer>65535){
    return;
  }
  printf("layer=%d\n",layer);
  kernel<<<1,1>>>(layer+1);
}

Launched on the host:

   kernel<<<1,1>>>(0);

If the tail recursion can be optimized by the compiler, is it still restricted by the maximum recursion level of 24, since "parent/child synchronization never occurs" ? If it is not restricted, how can the optimization be turned on by the compiler?

Thanks!

Original Q&A

Does nvcc support tail call optimization in dynamic parallelism?

There are 0 best solutions below

Related Questions in RECURSION

Related Questions in CUDA

Related Questions in TAIL-RECURSION

Related Questions in DYNAMIC-PARALLELISM

Trending Questions

Popular # Hahtags

Popular Questions