Poisson Process and multiprocessing. Is there a better solution?

343 Views Asked by At

I've to simulate load on a web application.

I wrote a python code that generate random requests following an exponential distribution.

The request is a simple url-get: i measure the response time and i store it on a file.

So, for a given time the code create a new process that perform the request, then he sleeps for a random time given by random.expovariate(lambd).

When i start a request i also store a timestamp to check if the average is close to 1/lambda.

I've problem when i set lambda > 20: the average is more high than 1/lambda and this results in a slow execution.

I test the random generator and it is very good, so i think the problem is when the system has to create a new process.

Is there a way to speed up this phase?

Perhaps there are some limits on processes creation?

I forgot to say that the python version is 2.7.3 and i can't upgrade it.

Using pypy there are some performance improvements but problem persist.

Here the code:

def request(results,url):
    start = time.time()
    try:
        r = requests.get(url)
    except:
        noactions
    else:
        # Append results (in seconds)
        results.write("{0},{1}\n".format(start,r.elapsed.microseconds/1e6))

def main():
    # Open results file
    results = open(responseTimes.txt",'a')
    processes = []
    # Perform requests for time t (seconds) with rate lambda=l
    start = time.time()
    elapsed = 0
    while (t > elapsed):
        p = Process(target=request, args=(results,url,))
        p.daemon = True
        p.start()
        processes.append(p)
        time.sleep(random.expovariate(l))
        elapsed = time.time() - start   
    # Wait for all processes to finish
    [p.join() for p in processes]
    # Close the file
    results.close()

if __name__ == "__main__":
    main()
1

There are 1 best solutions below

2
On

Analysis

It is likely that you are creating too many processes and that slows the system down. While creating a process usually takes less than 1.0/20 seconds, when there are too many this may well increase above 1.0/20, resulting in the observed rate:

from timeit import timeit
from multiprocessing import Process

def launch():
   Process().start()

for n in [1, 10, 100]:
   print timeit(stmt=launch, number=n)

# time in seconds to launch a process
# as measured on a 2.7 GHz Core i7 Mac Mini with OSX 10.9
0.000946998596191
0.00857591629028
0.0778558254242

These numbers may vary on your machine depending on factors like work load, CPU, memory etc.

Solution

Thas said, you should rather use an approach whereby you start say 10-20 processes (or even better: threads, as they take less overhead), each of which issues requests at a rate of 1/lambda:

def worker(n, results, url, l):
    print "worker %d started" % n
    while(True):
        # request the first URL
        request(results, url)
        # wait for next event with poission-process time lag
        time.sleep(random.expovariate(l))

def main():
    # your code to initialise
    ...
    # set parameters
    t = 100
    l = 20
    # start n workers
    n = 10
    processes=[]
    for i in range(0,n):
        print "starting worker %d" % i
        p = Process(target=worker, args=(i,results,url,l,))
        p.start()
        processes.append(p)
    # now wait for requested time
    time.sleep(t)
    # stop workers
    for p in processes:
        p.terminate()