I was wondering if there is a better way to get some job statistics (such as cputime, walltime, mem usage etc) in a PBS job script (once the job completes). In my current set up, I have a line at the end of my PBS script
qstat -f "${PBS_JOBID}"
But, the problem is if the job fails or gets killed for some reason, this line won't get executed. Please let me know other options that I can use.
I greatly appreciate any help or advice, thanks!
You may find the
tracejob
script useful. It is available in PBS derivative batch scheduling systems.tracejob
takes one argument, theJOB_ID
and one option-n days
that indicates how deep should it look into the log files for relevant stats.Note on split submission and server hosts
Note that
tracejob
works only if the logs are accessible on the host where it is invoked. On some installations, PBS server runs on one host and job submissions are performed on another and log files are stored on a file system, local to the PBS server. In this casetracejob
would not work.Example
qstat
fails since the job has completed, whiletracejob
worksYou can redirect
stderr
to/dev/null
when executingtracejob
to avoid multiple message of the formIn the above logs the information that is not relevant to the question was replaced with capitalized words.