Parallel-SSH - how to close ssh channel after a certain time?

1.3k Views Asked by At

Ok, so it's possible that the answer to this question is simply "stop using parallel-ssh and write your own code using netmiko/paramiko. Also, upgrade to python 3 already."

But here's my issue: I'm using parallel-ssh to try to hit as many as 80 devices at a time. These devices are notoriously unreliable, and they occasionally freeze up after giving one or two lines of output. Then, the parallel-ssh code hangs for hours, leaving the script running, well, until I kill it. I've jumped onto the VM running the scripts after a weekend and seen a job that's been stuck for 52 hours.

The relevant pieces of my first code, the one that hangs:

from pssh.pssh2_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
  client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
  result = client.run_command(cmd, stop_on_errors=False)
  return result

The next thing I tried was the channel_timout option, because if it takes more than 4 minutes to get the command output, then I know that the device froze, and I need to move on and cycle it later in the script:

from pssh.pssh_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
  client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, channel_timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
  result = client.run_command(cmd, stop_on_errors=False)
  return result

This version never actually connects to anything. Any advice? I haven't been able to find anything other than channel_timeout to attempt to kill an ssh session after a certain amount of time.

1

There are 1 best solutions below

0
On

The code is creating a client object inside a function and then returning only the output of run_command which includes remote channels to the SSH server.

Since the client object is never returned by the function it goes out of scope and gets garbage collected by Python which closes the connection.

Trying to use remote channels on a closed connection will never work. If you capture stack trace of the stuck script it is most probably hanging at using remote channel or connection.

Change your code to keep the client alive. Client should ideally also be reused.

from pssh.pssh2_client import ParallelSSHClient

def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
  client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
  result = client.run_command(cmd, stop_on_errors=False)
  return client, result

Make sure you understand where the code is going wrong before jumping to conclusions that will not solve the issue, ie capture stack trace of where it is hanging. Same code doing the same thing will break the same way..