from tqdm import tqdm
from joblib import Parallel, delayed
import numpy as np
import time
def func(x,a,b,c):
"""x3+ax2+bx+c"""
return x**3+a*x**2+b*x+c
def func_prime(x,a,b):
"""3x2+2ax+b"""
return 3*x**2+2*a*x+b
def check(i):
a,b,c = i
if func_prime(-1/4, a,b) != -1/4:
return False
if func_prime(1/4,a,b) >=0:
return False
for i in range(-300,300):
if func(i-1, a,b,c)*func(i+1,a,b,c)<0:
return False
return a,b,c
def gen():
a=np.arange(-1, 1, 1/128)
b=np.arange(-1, 1, 1/128)
c=np.arange(-1, 1, 1/128)
for i_a in a:
for i_b in b:
for i_c in c:
yield i_a, i_b, i_c
# single thread. it spend 11 seconds.
result = set()
for i in tqdm(gen()):
result.update([check(i)])
# using joblib. it spend 313 seconds.
start_time = time.time()
_ = Parallel(n_jobs=10)(delayed(check)(i) for i in gen())
end_time = time.time()
print(end_time - start_time)
I'm tring to solve a math quiz for fun, and solution is "fx=x3-0.375x2-0.625x + 0".
My code test all combinations of values, so I thought parallel programming make it faster. But this is not so, it is much slower than with a single thread.
My questions:
- why multithread is slower?
- how can I fix it faster than single thread?
- early stopping. if I find a solution for it, quit asap. how can I implement that?
tl;dr: You're iterating in the wrong place.
gen()
produces many values (triples), which are serialized and sent down pipes to worker sub-processes.First, that second
i
is the wrong identifier, it should bex
. Starting out with a tuple and then switching to a scalar is just bizarre, avoid doing that.Second, the
for
loop may exit early. Now, if we typically do six hundred iterations, this could possibly be a sensible parent-child work balance. But I get the sense that we're spending a lot of time serializing those 3-tuples, sending them down a pipe, deserializing them, and quickly doing an early exit long before hitting six hundred.You have nested a, b, c loops and ten cores. I recommend that the parent only loops over
a
values, and lets the child worry about theb, c
iteration. That way you spend a greater fraction of CPU cycles on evaluatingfunc
than on serializing / deserializing.Separate issue:
It's not usually recommended to test a float for exact equality (or in this case, inequality). When comparing
x == y
, prefer to compute whetherabs(x - y)
is less than some small epsilon (or in this case, greater than ϵ ).We have a special case, here, reciprocal of a power of 2, so IEEE-754 can exactly represent the quantity of interest. But in general, you should develop some healthy level of paranoia about FP rounding issues.