How to capture GPU data when profiling Tensorflow code with nvprof?

613 Views Asked by gef At 23 June 2025 at 22:43

I would like to profile the training loop of a transformer model written in Tensorflow on a multi-GPU system. Since the code doesn't support tf2, I cannot use the built-in but experimental profiler. Therefore, I would like to use nvprof + nvvp (CUDA 10.1, driver: 418).

I can profile the code without any errors, however, when examining the results in nvvp, there is no data for the GPUs. I don't know what causes this, as nvidia-smi clearly shows that the GPUs are utilized.

This thread seems to describe the same issue, but there is no solution. Following the suggestions in this question, I ran cuda-memcheck on the code, which yielded no errors.

I have tried running nvprof with additional command line arguments, such as --analysis-metrics (no difference) and --profile-child-processes (warns that it cannot capture GPU data), to no avail.

Could someone please help me understand why I cannot capture GPU data and how I can fix this?

Also, why are there so few resources on profiling deep neural networks? It seems that with long training times it is especially important to make sure to capitalize on all computing resources.

Thank you!

Original Q&A

There are 1 best solutions below

Boron On 20 July 2020 at 05:44

Consider add command line arguments --unified-memory-profiling off.

How to capture GPU data when profiling Tensorflow code with nvprof?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in PROFILING

Related Questions in NVIDIA

Related Questions in NVPROF

Trending Questions

Popular # Hahtags

Popular Questions