How to infer the possible error in a DNN job based on its CPU/GPU/memory usage?

39 Views Asked by At

While the DNN job encounters an error, is there any possible way that I can write a program to detect the potential error(or warning) only based on the CPU, GPU and memory usage of the DNN job? The solution doesn't need to be perfect. It's fine if the solution only covers certain situations.

Thanks!

0

There are 0 best solutions below