How can I see detailed work of nodes on a Rocks Cluster?

1.5k Views Asked by At

I built a Rocks Cluster for my school project, which is matrix multiplication, with one frontend and 5 other computers which are nodes. Over MPI I send them partions of matrix which they use for multiplication and then they send data back. Command which I run is:

mpirun -hostfile myhostfile ./myprogram

where myhostfile is a file of names of nodes and their slots(thread) numbers. My program is working and I'm trying to analize it now.

My question is how can i see the work of each nodes core/processor working on his task, are the all processors working, is there some kind of overload? I tried to install Vampir profiler and Intels Vtune Amplifierbut but I have some problems attaching them to my program with this command above (other comands dont allow me to run my programs on all threads of a node). All that i have accomplished (to see my nodes working good besides Ganglia) is to login to a node from the frontend and with the command "top" I could see when my program is executing by the number of threads and almost 100% CPU usage on each thread.

1

There are 1 best solutions below

0
On

Take a look at mpstat

With no params it will show aggregated load for all cores

mpstat -P ALL shows load for each core

This will give you realtime stats for your nodes:

watch pdsh -w compute-01-[01-10] mpstat

(use your compute nodes names)