I was trying out the Python multiprocessing module. In the code below the serial execution time 0.09 seconds and the parallel execution time is 0.2 seconds. As I am getting no speedup, I think I might be going wrong somewhere
import multiprocessing as mp
from random import uniform, randrange
import time
# m = mp.Manager()
out_queue = mp.Queue()
def flop_no(rand_nos, a, b):
cals = []
for r in rand_nos:
cals.append(r + a * b)
return cals
def flop(val, a, b, out_queue):
cals = []
for v in val:
cals.append(v + a * b)
# print cals
out_queue.put(cals)
# print "Exec over"
def concurrency():
# out_queue1 = mp.Queue()
# out_queue2 = mp.Queue()
a = 3.3
b = 4.4
rand_nos = [uniform(1, 4) for i in range(1000000)]
print len(rand_nos)
# for i in range(5):
start_time = time.time()
p1 = mp.Process(target=flop, args=(rand_nos[:250000], a, b, out_queue))
p2 = mp.Process(target=flop, args=(rand_nos[250000:500000], a, b, out_queue))
p3 = mp.Process(target=flop, args=(rand_nos[500000:750000], a, b, out_queue))
p4 = mp.Process(target=flop, args=(rand_nos[750000:], a, b, out_queue))
p1.start()
out_queue.get()
# print "\nFinal:", len(out_queue.get())
p2.start()
out_queue.get()
# print "\nFinal:", len(out_queue.get())
p3.start()
out_queue.get()
p4.start()
out_queue.get()
p1.join()
p2.join()
p3.join()
p4.join()
print "Running time parallel: ", time.time() - start_time, "secs"
def no_concurrency():
a = 3.3
b = 4.4
rand_nos = [uniform(1, 4) for i in range(1000000)]
start_time = time.time()
cals = flop_no(rand_nos, a, b)
print "Running time serial: ", time.time() - start_time, "secs"
if __name__ == '__main__':
concurrency()
no_concurrency()
# print "Program over"
My system has four cores. Please let me know of ways I can speedup this code. Also, what are my options for parallel programming with python(other than the multiprocessing module).
Thanks and Regards
out_queue.get()
blocks until a result is available by default. So you are essentially starting a process and waiting for it to finish before starting the next process. Instead, start all the processes, then get all the results.Example:
Output:
Note that parallel time is still slower. This is due to the overhead of starting 4 other Python processes. Your processing time for the whole job is only .2 seconds. The 3.5 seconds for parallel is mostly just starting up the processes. Note the commented out
# time.sleep(3)
above inflop()
. Add that code in and the times are:The overall time only got 3 seconds faster (not 12) because they were running in parallel. You need a lot more data to make parallel processing worthwhile.
Here's a version where you can visually see how long it takes to start the processes. "here" is printed as each process begins to run
flop()
. An event is used to start all threads at the same time, and only the processing time is counted:Output:
Now, the processing time got faster. Not by a lot...probably due to the interprocess communication to get the results.