I am just entering into the CUDA development world and now trying to profile my code. Expected to run the nvprof tool for profiling, but get the following error:
======== Warning: This version of nvprof doesn't support the underlying device, GPU profiling skipped
Searched for a bit, found out nvprof is legacy and all profiling should now be done with Nsight Systems CLI. When running nsys nvprof ./myapp 2 files are generated: report1.nsys-rep and report1.sqlite. How can I make use of these to obtain profiling information about my code?
Environment:
WSL with Ubunutu 20.04
NVIDIA Nsight Systems version 2023.1.2.43-32377213v0
Nvprof: Release version 10.1.243 (21)
NVCC: Cuda compilation tools, release 10.1, V10.1.243
I am expecting to obtain similar information as by using nvprof:

I have tried only this command for profiling: nsys nvprof ./myapp. Hoping to understand if it is the correct one or other better variants you might have.
Output of nsys profile --stats=true ./diverged
Generating '/tmp/nsys-report-04e5.qdstrm'
[1/8] [========================100%] report2.nsys-rep
[2/8] [========================100%] report2.sqlite
[3/8] Executing 'nvtx_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrt_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ---------- ---------- -------- --------- ----------- --------------
74.7 364907400 6 60817900.0 72485919.0 4489170 100201745 42231058.9 poll
24.3 118728446 345 344140.4 81962.0 541 10034413 1039273.8 ioctl
0.6 2840826 9 315647.3 449904.0 2254 535093 236455.8 read
0.2 920219 2 460109.5 460109.5 105991 814228 500799.2 sem_timedwait
0.1 471795 2 235897.5 235897.5 70382 401413 234074.3 pthread_create
0.1 310682 25 12427.3 8907.0 2785 95078 18330.8 mmap
0.0 84580 9 9397.8 10049.0 1473 15419 4316.1 open
0.0 80611 13 6200.8 4559.0 1382 17002 5451.1 fopen
0.0 65704 3 21901.3 21310.0 20649 23745 1630.5 write
0.0 48833 26 1878.2 70.5 60 46898 9182.3 fgets
0.0 18413 6 3068.8 1738.0 1182 8455 2815.7 fclose
0.0 8245 1 8245.0 8245.0 8245 8245 0.0 pipe2
0.0 7233 2 3616.5 3616.5 1853 5380 2494.0 munmap
0.0 6662 5 1332.4 1533.0 351 1853 579.3 fcntl
[5/8] Executing 'cuda_api_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain CUDA trace data.
[6/8] Executing 'cuda_gpu_kern_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain CUDA kernel data.
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain GPU memory data.
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain GPU memory data.
nvprofis a legacy tool and will not be receiving new features. It would be best to switch to Nsight Systems or Nsight Compute, depending on your profiling goals.Unless you have a specific profiling goal, the suggested profiling strategy is starting with Nsight Systems to determine system bottlenecks and identifying kernels that affect performance the most. On a second step, you can use Nsight Compute to profile the identified kernels and find ways to optimize them.
If you are familiar with
nvprofand want to keep using it, Nsight Systems supports thenvprofcommand, you can find more information in the documentation section Migrating from NVIDIAnvprof, or fromnsys nvprof --help.Regarding the use of the
.nsys-repfile, you can view its content using the Nsight Systems GUI, available for Windows, Linux (x86_64,SBSA), Mac. That means you can collect a profile on your target machine and share it and view it on other machines too. For example you can download the Windows Host to install the GUI.You can extract profiling information on a terminal by using the
nsys stats[3] andnsys analyze[4] commands. The latter two commands can receive either an.nsys-repfile or an.sqlitefile as input..sqlitefiles can also be used as conventional database files, that would probably be needed for more advanced usecases.