While the DNN job encounters an error, is there any possible way that I can write a program to detect the potential error(or warning) only based on the CPU, GPU and memory usage of the DNN job? The solution doesn't need to be perfect. It's fine if the solution only covers certain situations.
Thanks!