I'm trying to sort differently on each column on the mapper output. My output looks like this:
xx yy 2 4
xx yy 1 5
xx yy 5 39
xx yy 8 3
So the first 2 columns are text the the last 2 columns are numbers.
This is how I try to do this:
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator
-D "mapreduce.partition.keycomparator.options=-k1,2 -k3,3nr -k4,4nr"
It just doesn't sort numerically ... only alphabetically.
I also tried:
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator
-D mapreduce.partition.keycomparator.options='-k1,2 -k3,3nr -k4,4nr'
but got an error that -k3,3nr
is not a valid parameter.
Ideas?