Hadoop MapReduce - Join of two files and Computation on grouped values

129 Views Asked by At

I am fairly new to Hadoop and MapReduce programming. I want to know whether it is possible to group by another value (not key) after joining of two files.

I have two files which have following data

File1

name    marks
A       Male
B       Male
C       Female

File2

name    marks
A       25
B       28
A       30
C       22

Now is there any method to find the percentage of marks for each gender. I am trying to get the following as output

Male    percentage_of_marks_of_male_students
Female  percentage_of_marks_of_female_students

Is there anyway to do this in a single job. I've tried using two jobs for this, but couldn't find any headway.

Any tips would be appreciated.

Edit:

After joining the files I get something like this

{name1 - ["gender","marks1","marks2",...]}
{name2 - ["gender","marks1","marks2",...]}
{name3 - ["gender","marks1","marks2",...]}
...

I'm currently stuck at finding sum of marks of male and females separately in the reducer phase

Edit:

I have solved the problem. I used two jobs. First job joins two files, gives output as

[gender, the sum of marks of each student]

I sent the output file as input to second job which gives percentage of marks by gender.

0

There are 0 best solutions below