Executing python code in parallel with ndb tasklets

718 Views Asked by At

First of all i know i can use threading to accomplish such task, like so:

import Queue
import threading


# called by each thread
def do_stuff(q, arg):
    result = heavy_operation(arg)
    q.put(result)

operations = range(1, 10)

q = Queue.Queue()

for op in operations:
    t = threading.Thread(target=do_stuff, args = (q,op))
    t.daemon = True
    t.start()

s = q.get()
print s

However, in google app engine there's something called ndb tasklets and according to their documentation you can execute code in parallel using them.

Tasklets are a way to write concurrently running functions without threads; tasklets are executed by an event loop and can suspend themselves blocking for I/O or some other operation using a yield statement. The notion of a blocking operation is abstracted into the Future class, but a tasklet may also yield an RPC in order to wait for that RPC to complete.

Is it possible to accomplish something like the example with threading above?

I already know how to handle retrieving entities using get_async() (got it from their examples at doc page) but its very unclear to me when it comes to parallel code execution.

Thanks.

1

There are 1 best solutions below

1
On

The answer depended on what your heavy_operation really is. If the heavy_operation use RPC (Remote Procedure Call, such as datastore access, UrlFetch, ... etc), then the answer is yes.

In how to understand appengine ndb.tasklet? I asked a similar question, you may find more details there.

May I put any kind of code inside a function and decorate it as ndb.tasklet? Then used it as async function later. Or it must be appengine RPC?

The Answer

Technically yes, but it will not run asynchronously. When you decorate a non-yielding function with @tasklet, its Future's value is computed and set when you call that function. That is, it runs through the entire function when you call it. If you want to achieve asynchronous operation, you must yield on something that does asynchronous work. Generally in GAE it will work its way down to an RPC call.