Putting combiner to use in mapreduce secondary sorting

804 Views Asked by At

I have implemented secondary sorting for my application.

File-1                          File-2                    File-3
------                          ------                    ------

name,pos,r,value           name,pos,r,value            name,pos,r,value

   aa,1,0,123                 aa,2,1,1                    aa,3,1,11
   bb,1,0,234                 aa,2,2,34                   aa,3,2,12
                              aa,2,3,55                   aa,3,3,13
                              bb,2,1,99                   bb,3,1,15
                              bb,2,2,54                   bb,3,2,19
                              bb,2,3,32                   bb,3,3,13

For every record in File-1, three records will be available in File2 and File3 each.

composite key is ::name + (pos+r)

natural key is :: name

sorting order is based on the composite key. Ascending order based on (pos+r)

Expected output is

File1 contents of a particular name (aa) followed by all file2 contents (three rows of aa ordered based on pos+r) and then followed by file three contents (three rows of aa ordered based on pos+)

aa,123,1,34,55,11,12,13

bb,234,99,54,32,15,19,13

I have implemenyed this in secondary sorting using setGroupingComparatorClass, setSortComparatorClass and custom partitioner.

My doubts are : ??

1) How to add combiner for this scenario.

  • According to my understanding, the grouping and sorting happens in the reducer phase once all the map outputs (which are partitioned based on natural key)are transferred to reduce machine.

2) If combiner is added, how and when the sorting will happen so that the reduce function receives outputs from all mapper in proper order .

  • Will the map outputs be sorted twice, once in combiner that's executed after every map and again on the reducer side to sort all the combiner outputs ?
1

There are 1 best solutions below

3
KrazyGautam On

Will suggest you to kindly go through http://bytepadding.com/big-data/map-reduce/understanding-map-reduce-the-missing-guide/

  1. Sorting happens on mapper .
  2. Merging (sorting and merging) happens on reducer.
  3. Combiner is a extra layer, Where you try to reduce on Mapper.
  4. A reducer always receives all given values for a given key.
  5. Mapper sends the values of a given key in sorted fashion.

Please makeyourself aware about group comparator and Sort comparator and use it appropriately.