How to steps differences reduce in Hadoop?
I have a problem with understand Hadoop. I have two files and first I did a join between those files. One file is about countries and the other is about client in each country.
Example, clients.csv:
Bertram Pearcy ,bueno,SO
Steven Ulman ,regular,ZA
Countries.csv
Name,Code
Afghanistan,AF
Ã…land Islands,AX
Albania,AL
…
I did one map reduce that give me how many “good” (bueno) clients have a country (ZA, SO) and with countries.csv I know with country we are talking.
I programmed:
def steps(self):
# ordenamos las operaciones para su ejecución.
return [
MRStep(mapper=self.mapper
,reducer=self.reducer),
MRStep(mapper=self.mapper1
,combiner=self.combiner_cuenta_palabras
,reducer=self.reducer2
),
]
The result of my map/reduce is:
["South Georgia and the South Sandwich Islands"] 1
["South Sudan"] 1
["Spain"] 3
Now, I would like to know which one is the best.
I added one reduce more.
def reducer3(self, _, values):
yield _, max (values)
def steps(self):
# ordenamos las operaciones para su ejecución.
return [
MRStep(mapper=self.mapper
,reducer=self.reducer),
MRStep(mapper=self.mapper1
,combiner=self.combiner_cuenta_palabras
,reducer=self.reducer2),
MRStep(#mapper=self.mapper3,
reducer=self.reducer3
#,reducer=self.reducer3
),
]
But I have the same answer than without that reducer
I try to use one map/reduce program adding another reduce. It that does not work.
With my first reduce I got:
A, 10
C, 2
D, 5
Now, I would like to use that result I get: A, 10
Additional comment:
INPUT [Fille1]+[File2] => enter image description here
MAP/REDUCE => OUT
Now, I need that with additional map/reduce ( and I would like to use what I did) get another answers.
First) For instance, one and only one answer. Example: 3 Spain
Second) All with the best or bigger number, 3 Spain and 3 Guan.
I try to use:
def reducer3(self, _, values):
yield _, max (values)
And I add,
def steps(self):
# ordenamos las operaciones para su ejecución.
return [
MRStep(mapper=self.mapper
,reducer=self.reducer),
MRStep(mapper=self.mapper1
,combiner=self.combiner_cuenta_palabras
,reducer=self.reducer2),
MRStep(reducer=self.reducer3
),
]
But I still have the same result. I Know that REDUCER3 is using because if I write max(values)+1000 give me the same result but with number 1001, 1003
Your reducer is getting 3 distinct keys, therefore you're finding the max of each, and
valuesonly has one element (try printing its length... ). Therefore, you get 3 results.You need a third mapper that returns
(None, f'{key}|{value})for example, then all records will be sent to one reducer, where you can then iterate, parse, and aggregate the resultsThat'll only return one result for all values. If you want to capture equal max values, I think you'll need to iterate over the list more than once, then yield within a loop of found max elements