Technique to measure GPU utilization over a given period of time

85 Views Asked by At

We run an HPC cluster with GPUs. We would like to report the overall GPU utilization for the job. I know I can do it by periodically sampling in the background and doing the math. I was wondering if there was a tool where I could basically start the sampling period at the beginning of the job and then stop it at the end of the job and just have it report the overall average GPU utilization? For instance, AFAICT nvidia-smi will only do 1 second intervals. I am looking (hoping) for an option on it or a similar tool for start/stop functionality. Note that an arbitrary time period wont work unless I can end it early and get the results up that point as you never know how long the job will run. I would appreciate any pointers / ideas anyone could provide.

0

There are 0 best solutions below