RSH connection refused while running MPI program

1.5k Views Asked by At

I'm trying to run MPI programs on 8 machines, but I get the error

connect to address 127.0.0.1 port 544: Connection refused
Trying krb4 rsh...
connect to address 127.0.0.1 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
lagrid02: Connection refused

When I run it with a machinefile option, I get the error lagrid03: No route to host where lagrid03 is the neighbouring node connected to master node.

How should I rectify this ?

2

There are 2 best solutions below

0
On BEST ANSWER

Regarding your first error, is rsh running on (all) the machine(s)? You'll need rsh or password-less ssh configured (and ask your mpi job launcher use ssh) before you can start jobs on different machines.

The second error indicates that there is no way to reach the machine lagrid03 with the current network config. I guess you have a /etc/hosts entry with the IP addresses for lagrid03, but you do not have an interface configured in that network. For a more detailed answer you'll need to post details about your network configuration.

0
On

The issue is with authentication, if you go into the /etc/pam.d/rsh file and move rlogin and rsh to the top and make it look like this, it would work just fine.

/* For root login to succeed here with pam_securetty, "rsh" must be listed in /etc/securetty.*/

auth required pam_nologin.so

auth required pam_securetty.so

auth required pam_env.so

auth required pam_rhosts_auth.so

account include system-auth

session optional pam_keyinit.so force revoke
session include system-auth