Problem running UPC code over a network : connecion refused

246 Views Asked by At

when I run a UPC code over a network of 2 nodes, with -v option enabled to give me detailed information of execution, I notice that the master node (glitch.rutgers.edu) tries to connect to itself as opposed to connect to it's neighbouring nodes.

/usr/bin/rsh glitch.rutgers.edu -l sharatds -n '/usr/bin/env'
'GASNET_MAX_SEGSIZE='74344KB'' 'GASNET_VERBOSEENV='1'' '/cac/u01/sharatds/UPC_Tests/./upcMatrxMultplction_mpi' glitch.rutgers.edu 41449 \-p4amslave \-p4yourname glitch.rutgers.edu \-p4rmrank 1
    glitch.rutgers.edu: Connection refused
    p0_5078:  p4_error: Child process exited while making connection to remote process on glitch.rutgers.edu: 0
    p0_5078: (45.046875) net_send: could not write to fd=4, errno = 32
    gasnetrun: unlinking gasnetrun_mpi-temp-4813/rsh gasnetrun_mpi-temp-4813/ssh gasnetrun_mpi-temp-4813/mpirun-rsh gasnetrun_mpi-temp-4813/mpirun-tmp

Why is this happening ? Any changes to the configuration should set this right ?

Thanks for your help

1

There are 1 best solutions below

0
On BEST ANSWER

this error is likely from rsh...you can confirm this by trying to run an rsh command from the master node back to itself like "rsh glitch pwd" (my guess is this will prompt you for a password).