In python 2.7 I am trying to distribute the computation of a two-dimensional array on all of the cores.
For that I have two arrays associated with a variable at the global scope, one to read from and one to write to.
import itertools as it
import multiprocessing as mp
temp_env = 20
c = 0.25
a = 0.02
arr = np.ones((100,100))
x = arr.shape[0]
y = arr.shape[1]
new_arr = np.zeros((x,y))
def calc_inside(idx):
new_arr[idx[0],idx[1]] = ( arr[idx[0], idx[1] ]
+ c * ( arr[idx[0]+1,idx[1] ]
+ arr[idx[0]-1,idx[1] ]
+ arr[idx[0], idx[1]+1]
+ arr[idx[0], idx[1]-1]
- arr[idx[0], idx[1] ]*4
)
- 2 * a
* ( arr[idx[0], idx[1] ]
- temp_env
)
)
inputs = it.product( range( 1, x-1 ),
range( 1, y-1 )
)
p = mp.Pool()
p.map( calc_inside, inputs )
#for i in inputs:
# calc_inside(i)
#plot arrays as surface plot to check values
Assume there is some additional initialization for the array arr
with some different values other than that exemplary 1
-s, so that the computation ( an iterative calculation of the temperature ) actually makes a sense.
When I use the commented out for
-loop, instead of the Pool.map()
method, everything works fine and the array actually contains values. When using the Pool()
function, the variable new_array
just stays in its initialized state ( meaning it contains only the zeros, as it was originally initialised with ).
Q1 : Does that mean that Pool()
prevents writing to global variables?
Q2 : Is there any other way to tackle this problem with parallelization?
A1 :
your code actually does not use any variable declared using a syntax of
global <variable>
. Nevertheless, do not try to go into trying to use them, the less in going into distributed processing.A2 :
Yes, going parallelised is possible, but better get a thorough sense of the costs of doing so, before spending ( well, wasting ) efforts, that never justify costs for doing so.
Why starting about costs?
Will you pay your bank clerk an amount of $2.00 to receive just a $1.00 banknote in exchange?
Guess no one would ever do.
The same is with trying to go parallel.
Syntax is "free" and "promising", Syntax costs of the actual execution of a simple and nicely-looking syntax-constructor is not. Expect rather shocking surprises instead of receiving any dinner for free.
What are the actual costs? Benchmark. Benchmark. Benchmark!
The useful work
Your code actually does just a few memory accesses and a few floating point operations "inside" the block and exits. These FLOP-s take less than a few tens of
[ns]
max a few units of[us]
on recent CPU frequencies ~2.6 ~ 3.4 [GHz]
. Do benchmark it:So, a pure
[SERIAL]
-process execution, on a Quad-core CPU, will not be worse than about aT+T+T+T
( having been executed one after another ).What will happen, if the same amount of usefull work will be enforced to happen inside some form of the now
[CONCURRENT]
-process execution ( usingmultiprocessing.Pool
's methods ), potentially using more CPU-cores for the same purpose?The actual compute-phase, will not last less than
T
again, right? Why ever would it be? Yes, never.The overheads i.e. the Costs you will always pay per-call ( and that is indeed many times )
Subprocess setup + termination costs ( benchmark it to learn the scales of these cost ). Memory access costs ( latency, where zero cache-re-use happens to help, as you exit ).
Epilogue
Hope the visual message is clear enough to ALWAYS start accounting the accrued costs before deciding about any achievable benefits from going into re-engineering code into a distributed process-flow, having a multi-core or even many-core
[CONCURRENT]
fabric available.