I want to gather to the same reducer function all the values of the keys which have at least one integer in common. For example all the values that correspond to the key "1,2" and all the values that correspond to the key "2,3" must be always in the same reducer function because these two keys have the integer 2 in common.
In another way, I just want to change the "key equality condition" to another condition.
Is there a way to do this? Is it relevant with the Partitioner class or I have to do something completely different?
I use 1.2.1 hadoop version if this matters.
Thanks in advance!
I have only one Reducer function per job, I agree with that. However, when I run hadoop as a simulation in NetBeans (not in distributed mode) it creates one reducer task for each unique key. For instance, If I have only 3 keys (k1,k2,k3) it will call the reduce function 3 times, one for each of these keys.
Therefore, the values which correspond to key k1 , can be accessed only from that reducer's task and the same happens for k2 and k3 values. What I want to do is to gather k1 and k2 to the same task(assuming that these two keys have something in common) so that I can access all these values (which correspond to k1 and k2 key) from only one reducer task.
In addition, I read this example and I thought that I understood it until I run it and I saw that it creates 2 reducer tasks again and not 3 which is the number of the age groups in the partitioner.