I need to measure the wall time of a serial code running on our cluster. In an exclusive mode, i.e., no other user is using my node, the wall time of the code vary quite a lot, ranging from 2:30m to 3:20m. The code does the same thing in every run. I am wandering if the big variance in the wall time is caused by the GPFS file system since the code reads and writes to files stored in a GPFS file system. My question is if there is a tool I can view the GPFS i/o performance and relate it to the performance of my code?
Thanks.
This is a very big question...we need to narrow it down a bit. So, let me ask some questions.
Let us see the time command output for a simple ls command.
$ time ls real 0m0.003s user 0m0.001s sys 0m0.001s
Wall clock time is == real time, which in your case, is varying. If we go to the next step of debugging, the question to ask is: does user time and system time also varies? If GPFS file system is inside the kernel and consumes varying time, you should see the sys time vary. If the sys time remains the same, but the real time varies, then the program is spending time sleeping on something. There are more deeper ways to find the problem....but can you please clarify your question more?