Discrepancy between Nvidia-smi Memory Usage and PyTorch Memory Reporting on GPU

152 Views Asked by At

I'm encountering a discrepancy between the memory utilization reported by nvidia-smi and what PyTorch is reporting when using GPU memory. For example, the following code snippet will print 2MB:

tensor = torch.randn(256, 256, device='cuda', dtype=torch.float16) 
snapshot = torch.cuda.memory._snapshot()
total_size = snapshot['segments'][0]["total_size"] / 1024 / 1024
print(total_size,"MiB")

Which is expected, since according to torch documentation,the smallest memory chunk that torch requests is 2MiB

However, when checking nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:41:00.0 Off |                  Off |
|  0%   34C    P8    14W / 450W |    391MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |

It appears that 391 MiB have been already used. Where is this mismatch coming from?

0

There are 0 best solutions below