Is ipython or numpy secretly parallelizing matrix multiplication?

607 Views Asked by At

So the case is the following: I wanted to compare the runtime for a matrix multiplication with ipython parallel and just running on a single core.

Code for normal execution:

import numpy as np
n = 13
dim_1, dim_2, dim_3, dim_4 = 2**n, 2**n, 2**n, 2**n
A = np.random.random((dim_1, dim_2))
B = np.random.random((dim_3, dim_4))
start = timeit.time.time()
C = np.matmul(A,B)
dur = timeit.time.time() - start

well this amounts to about 24 seconds on my notebook If I do the same thing trying to parallize it. I start four engines using: ipcluster start -n 4 (I have 4 cores). Then I run in my notebook:

from ipyparallel import Client
c = Client()
dview = c.load_balanced_view()
%px import numpy
def pdot(view_obj, A_mat, B_mat):
    view_obj['B'] = B
    view_obj.scatter('A', A)
    view_obj.execute('C=A.dot(B)')
    return view_obj.gather('C', block=True)
start = timeit.time.time()
pdot(dview, A, B)
dur1 = timeit.time.time() - start
dur1

which takes approximately 34 seconds. When I view in the system monitor I can see, that in both cases all cores are used. In the parallel case there seems to be an overhead where they aren't on 100 % usage (I suppose that's the part where they get scattered across the engines). In the non parallel part immediately all cores are on 100 % usage. This surprises me as I always thought python was intrinsically run on a single core.

Would be happy if somebody has more insight into this.

0

There are 0 best solutions below