Using a loop in a CUDA graph

586 Views Asked by Jakub Mitura At 28 June 2025 at 02:14

I have kernel A, B, and C which need to be executed sequentially.

A->B->C

They are executed in a while loop until some condition will be met.

while(predicate) {
    A->B->C
}

The while loop may be executed from 3 to 2000 times - information about a fact that a loop should stopped is produced by kernel C.

As the execution is related to multiple invocations of relatively small kernels CUDA Graph sounds like a good idea. However, CUDA graph implementation I have seen are all linear or tree-like without loops.

Generally, if the loop is not possible, the long chain of kernels of the length 2000 with possibility of early stop invoked from kernel C would be also OK. However, is it possible to stop the graph execution in some position by the call from inside of the kernel?

Original Q&A

There are 1 best solutions below

einpoklum On 17 January 2022 at 13:43 BEST ANSWER

CUDA graphs have no conditionals. A vertex of the graph is visited/executed when its predecessors are complete, and that's that. So, fundamentally, you cannot do this with a CUDA graph.

What can you do?

Have a smaller graph for the loop iteration, and repeatedly schedule it.
Have A, B and C start their execution by checking the loop predicate - and skip all work if it holds. With that being the case, you can schedule many instances of A->B->C->A->B->C etc - which, starting at some point, will do nothing.
Don't rely on the CUDA graphs API. It's not a general-purpose parallel execution mechanism. :-(

Using a loop in a CUDA graph

There are 1 best solutions below

Related Questions in CUDA

Related Questions in GPU

Related Questions in NVIDIA

Related Questions in SCHEDULING

Related Questions in CUDA-GRAPHS

Trending Questions

Popular # Hahtags

Popular Questions