My algorithm currently uses nr_reduces 1 because I need to ensure that the data for a given key is aggregated.
To pass input to the next iteration, one should use "chain_reader". However, the results from a mapper are as a single result list, and it appears this means that the next map iteration takes place as a single mapper! Is there a way to split the results to trigger multiple mappers?
I could give a long answer but since this question is 3 years old: check out this page: http://discoproject.org/doc/disco/howto/dataflow.html#single-partition-map
In short: When there is N input for the mapper function, the output will be N and by setting
merge_partitions=False
your reduce will output N blobs. Now if you want to generate more outputs than inputs you can passpartions=N
. But when your disco job consists of just a mapper function and you want to generate partitioned output, then add the simplest reduce fase combined with the params stated above to get that partitioned output.