I have a problem, which, when simplified:
- has a loop which samples new points
- evaluates them with a complex/slow function
- accepts them if the value is above an ever-increasing threshold.
Here is example code for illustration:
from numpy.random import uniform
from time import sleep
def userfunction(x):
# do something complicated
# but computation always takes takes roughly the same time
sleep(1) # comment this out if too slow
xnew = uniform() # in reality, a non-trivial function of x
y = -0.5 * xnew**2
return xnew, y
x0, cur = userfunction([])
x = [x0] # a sequence of points
while cur < -2e-16:
# this should be parallelised
# search for a new point higher than a threshold
x1, next = userfunction(x)
if next <= cur:
# throw away (this branch is taken 99% of the time)
pass
else:
cur = next
print cur
x.append(x1) # note that userfunction depends on x
print x
I want to parallelise this (e.g. across a cluster), but the problem is that I need to terminate the other workers when a successful point has been found, or at least inform them of the new x (if they manage to get above the new threshold with an older x, the result is still acceptable). As long as no point has been successful, I need the workers repeat.
I am looking for tools/frameworks which can handle this type of problem, in any scientific programming language (C, C++, Python, Julia, etc., no Fortran please).
Can this be solved with MPI semi-elegantly? I don't understand how I can inform/interrupt/update workers with MPI.
Update: added code comments to say most tries are unsuccessful and do not influence the variable userfunction depends on.
if
userfunction()
does not take too long, then here is an option that qualifies for "MPI semi-elegantly"in order to keep thing simple, let's assume rank 0 is only an orchestrator and does not compute anything.
on rank 0
on rank != 0
extra logic is needed to correctly handle - rank 0 does computation too - several ranks find the solution at the same time, subsequent messages must be consumed by rank 0
strictly speaking, a task is not "interrupted" when a solution is found on an other task. instead, each task check periodically if the solution was found by an other task. so there is a delay between the time a solution if found somewhere and all tasks stop looking for solutions, but if
userfunction()
does not take "too long", this looks very acceptable to me.