How to count value column in Apache Pig while filter out distinct key

22 Views Asked by At

I'm trying to count the total number of ordered prouduct(the quantity) by product_id.

The data looks like this.

(product_id, quantity)

which also is

(11, 5)
(11, 2)
(11, 1)
(12, 9)
(12, 1)
(13, 5)
(13, 9)
(13, 9)

There will be a duplicated product_id as it has been ordered many times and each time with various quantity.

How do I count the total quantity per product_id?

Expected out come will be

(11, 8)
(12, 10)
(13, 23)
1

There are 1 best solutions below

0
Janani On

You can use GROUP BY.

Lets say you have your expected input data in a variable called products.

count_prod = FOREACH (GROUP products BY product_id) {
get_one_record = LIMIT product 1;
GENERATE FLATTEN(get_one_record), SUM(quantity) AS total_quantity;
}; 
final_products_data = FOREACH count_prod GENERATE get_one_record::product_id AS product_id, total_quantity;

Hope this helps.