Celery for Map-Reduce, or other alternatives in Python?

2.9k Views Asked by Leonth At 28 July 2025 at 08:58

I have expensive jobs that are very suited to be run under map-and-reduce model (long story short, it is to aggregate a few hundred rankings that are previously calculated via some time-consuming algorithm).

I wanted to parallelize the jobs on clusters (not merely multiprocessing), and focused on 2 implementations: Celery and Disco. Celery does not support naive map-and-reduce out of the box, and although the "map" part is easily done using TaskSets, how do you implement the "reduce" part efficiently?

(My problem with disco is that it does not run on Windows, and I have already setup celery for another part of the program, so running another framework for map-reduce seems to be rather inelegant.)

Original Q&A

There are 2 best solutions below

Arshad Ansari On 27 August 2013 at 13:46

Please take a look at the following blog.

http://mikecvet.wordpress.com/2010/07/02/parallel-mapreduce-in-python/

AudioBubble On 29 July 2011 at 08:40

Basically you need to take the output of one tasks and apply the output as input to another task. celery is not handy in this.

In celery way, you can have a Periodic Task scheduler that execute the jobs (map part) in the async manner and keep the task reference itself if it is single computer or post the reference to DB backend(redis/mongo/etc). You might need schedulers to collect this result and apply on reduce function(s).

I would say that you run your own python processes for map and reduce on all the clusters and make sure that you store the result in memory db like redis and use the celery to execute the tasks on map and reduce. Your main process would collect and combine the results.

Celery for Map-Reduce, or other alternatives in Python?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in MAPREDUCE

Related Questions in CELERY

Related Questions in DISCO

Trending Questions

Popular # Hahtags

Popular Questions