How to ensure distribution of a heavy task to other nodes using dispy?

Question

How to ensure distribution of a heavy task to other nodes using dispy?

134 Views Asked by Naman Sancheti At 16 March 2016 at 18:33

I'm currently performing computation of the factorial of 10 random numbers using dispy, which "distributes" the tasks to various nodes. However, if one of the computation is of the factorial of a large number let's say factorial(100), then if the that task takes a very long time, yet dispy runs it only on a single node.

How do I make sure that dispy breaks down and distributes this task to other nodes, so that it doesn't take so much time?

Here's the code that I have come up with so far, where the factorial of 10 random numbers is calculated and the 5th computation is always of factorial(100) :-

# 'compute' is distributed to each node running 'dispynode'

def compute(n):
    import time, socket
    ans = 1
    for i in range(1,n+1):
        ans = ans * i
    time.sleep(n)
    host = socket.gethostname()
    return (host, n,ans)

if __name__ == '__main__':
    import dispy, random
    cluster = dispy.JobCluster(compute)
    jobs = []
    for i in range(10):
        # schedule execution of 'compute' on a node (running 'dispynode')
        # with a parameter (random number in this case)
        if(i==5):
            job = cluster.submit(100)    
        else:
            job = cluster.submit(random.randint(5,20))
        job.id = i # optionally associate an ID to job (if needed later)
        jobs.append(job)
    # cluster.wait() # waits for all scheduled jobs to finish
    for job in jobs:
        host, n, ans = job() # waits for job to finish and returns results
        print('%s executed job %s at %s with %s as input and %s as output' % (host, job.id, job.start_time, n,ans))
        # other fields of 'job' that may be useful:
        # print(job.stdout, job.stderr, job.exception, job.ip_addr, job.start_time, job.end_time)
    cluster.print_status()

Original Q&A

There are 1 best solutions below

**Chris Johnson** · Answer 1 · 2016-06-14T21:44:21.687000

Dispy distributes the tasks as you define them - it doesn't make the tasks more granular for you.

You could create your own logic for granulating the tasks first. That's probably pretty easy to do for a factorial. however I wonder if in your case the performance problem is due to this line:

time.sleep(n)

For factorial(100), why do you want to sleep 100 seconds?

How to ensure distribution of a heavy task to other nodes using dispy?

There are 1 best solutions below

Related Questions in DISTRIBUTED-COMPUTING

Related Questions in DISPY

Trending Questions

Popular # Hahtags

Popular Questions