I am working with Pig 0.12.1
and Map-R
. I am trying to find max of a field after grouping the relation on some other field. Refer the following pig script and structure of relation in comments-
r1 = foreach SomeRelation generate flatten(group) as (c1 , c2);
-- r1: {c1: biginteger,c2: biginteger}
r2 = group r1 by c1;
-- r2: {group: chararray,r1: {(c1: chararray,c2: biginteger)}}
DUMP r2;
/* output -
1234|{(1234,9876)}
2345|{(2345,8765)}
3456|{(3456,7654)}
4567|{(4567,6543)}
*/
r3 = foreach r2 generate group as c1, MAX(r1.c2) as c2;
I am getting the following error
Could not infer the matching function for org.apache.pig.builtin.MAX as multiple or none of them fit. Please use an explicit cast.
Script Explained-
I am flattening group of SomeRelation into c1, c2 and then regrouping on c1 to generate max of c2 with each c1 group.
Please suggest.
Well it looks like the problem is that Pig doesn't allow MAX(or for that matter aggregate functions like SUM etc) on biginteger. Had to use long as a datatype for this to work. Refer the following-
Strangely, there's no documentation highlighting this almost like datatypes biginteger and bigdecimal.