dispy sample program hangs

2.6k Views Asked by At

TL;DR: I can't get the most basic dispy sample code to run properly. Why not?

The details:

I'm trying to get into distributed processing in python, and thought the dispy library sounded interesting, due to the comprehensive feature set.

However, I've been trying to follow their basic canonical program example, and I'm getting nowhere.

  • I've installed dispy (python -m pip install dispy)
  • I went on to another machine with the same subnet address and ran python dispynode.py. It seems to work, as I get the following output:

    2016-06-14 10:33:38 dispynode - dispynode version 4.6.14
    2016-06-14 10:33:38 asyncoro - version 4.1 with epoll I/O notifier
    2016-06-14 10:33:38 dispynode - serving 8 cpus at 10.0.48.54:51348

    Enter "quit" or "exit" to terminate dispynode, "stop" to stop
    service, "start" to restart service, "cpus" to change CPUs used,
    anything else to get status:

  • Back on my client machine, I run the sample code downloaded from http://dispy.sourceforge.net/_downloads/sample.py, copied here:


# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below
def compute(n):
    import time, socket
    time.sleep(n)
    host = socket.gethostname()
    return (host, n)

if __name__ == '__main__':
    # executed on client only; variables created below, including modules imported,
    # are not available in job computations
    import dispy, random
    # distribute 'compute' to nodes; 'compute' does not have any dependencies (needed from client)
    cluster = dispy.JobCluster(compute)
    # run 'compute' with 20 random numbers on available CPUs
    jobs = []
    for i in range(20):
        job = cluster.submit(random.randint(5,20))
        job.id = i # associate an ID to identify jobs (if needed later)
        jobs.append(job)
    # cluster.wait() # waits until all jobs finish
    for job in jobs:
        host, n = job() # waits for job to finish and returns results
        print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
        # other fields of 'job' that may be useful:
        # job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
    cluster.print_status()  # shows which nodes executed how many jobs etc.

When I run this (python sample.py), it just hangs. Debugging through pdb, I see it eventually is hanging at dispy/__init__.py(117)__call__(). The line reads self.finish.wait(). finish is just a python thread, as wait() then goes into lib/python3.5/threading.py(531)wait(). It hangs once it hits the wait.

I've tried running dispynode on the client machine and gotten the same results. I've tried a lot of variants of passing nodes into the creation of the cluster, e.g:

cluster = dispy.JobCluster(compute, nodes=['localhost'])
cluster = dispy.JobCluster(compute, nodes=['*'])
cluster = dispy.JobCluster(compute, nodes=[<hostname of the remote node running the client>])

I've tried running with the cluster.wait() line uncommented, and got the same results.

When I added logging (cluster = dispy.JobCluster(compute, loglevel = 10)), I got the following output on the client side:

2016-06-14 10:27:01 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:27:01 dispy - dispy client at :51347 2016-06-14 10:27:01 dispy - Storing fault recovery information in "_dispy_20160614102701"
2016-06-14 10:27:01 dispy - Pending jobs: 0
2016-06-14 10:27:01 dispy - Pending jobs: 1
2016-06-14 10:27:01 dispy - Pending jobs: 2
2016-06-14 10:27:01 dispy - Pending jobs: 3
2016-06-14 10:27:01 dispy - Pending jobs: 4
2016-06-14 10:27:01 dispy - Pending jobs: 5
2016-06-14 10:27:01 dispy - Pending jobs: 6
2016-06-14 10:27:01 dispy - Pending jobs: 7
2016-06-14 10:27:01 dispy - Pending jobs: 8
2016-06-14 10:27:01 dispy - Pending jobs: 9
2016-06-14 10:27:01 dispy - Pending jobs: 10

This doesn't seem unexpected, but doesn't help me figure out why the jobs aren't running.

For what it's worth, here's _dispy_20160614102701.bak:

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

and similarly, _dispy_20160614102701.dir:

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

I'm out of guesses, unless I'm using an unstable version.

4

There are 4 best solutions below

3
user6466166 On

If you're just running sample.py on your client, change the following in your main statement:

cluster = dispy.JobCluster(compute, nodes=['nodeip_1','nodeip_2',.....,'nodeip_n])

Then run it in your IDE, or via shell.

I hope that helps.

1
ThomasGuenet On

Before executing python sample.py, dispynode.py should still be running on the localhost or the other machine (notice the other machine should be in the same network if you do not want to specify complex options).

I was experiencing the same issue and solved it this way :

  • open a terminal and execute : $ dispynode.py (do not terminate it)
  • open a second terminal and execute : $ python sample.py

Do not forget function compute consists in waiting a certain time, outputs should appear at least 20 seconds after executing sample.py.

1
Dave On

When first setting up and using dispy on a network, I found that I had to specify the client node IP when creating the job cluster, see below:

cluster = dispy.JobCluster(compute, ip_addr=your_ip_address_here)

See if that helps.

0
mr-suroot On

try this instead

python /home/$user_name/.local/lib/python3.9/site-packages/dispy/dispynode.py
python sample.py

It worked for me