I have a project which has thousands of threads, but I want to use the Nsight System to profile the CUDA code. However, loading the report takes a while which I believe is due to the high number of thread information, in addition to all the visual clutter of those threads which I don't currently care about information on.
Is there a way to toggle collecting thread information or limit it before loading a report in the Nsight System GUI?
If profiling through the CLI
Check the
-s/--sampleand--cpuctxswoptions, for theprofileorstartcommands, link to documentation. You can set both tonone, to minimize the amount of information collected from the CPU side.If profiling a Linux target: check also the
-t/--traceoption for theprofileorlaunchcommands. Essentially you would like to excludeosrtfrom the trace options, it is enabled by default.If you want to collect only CUDA events, then you can use
nsys profile -t cuda -s none --cpuctxsw=none <app>.If profiling through the GUI
You can deselect the "Collect CPU IP/backtrace samples" and "Collect CPU context switch trace" boxes.
If profiling a Linux target: you can additionally deselect the "Collect OS runtime libraries trace" box.
If the data is collected, it is not possible to exclude it from rendering on the GUI. You can minimize threads, or hide them by right clicking on "Threads" -> "Show less".