How to sum a specific column when key matches in PIG

I have sample data as shown below:

(id,code,key,value)
1,A,p,10
2,B,q,20
3,B,p,30
3,B,q,20
3,C,t,60
3,C,q,20

      

After saving to PIG, I need output like below:

O/P:

(A,{(p,10)})

(B,{(q,40),(p,30)})

(C,{(t,60)},(q,20))

      

We can remove the ID and get an output that adds the sum of the entire value that matches the key for a particular code. in the above example, we can see for the code Bq, 20 twice, so added and became q, 40.

Below is my code but can't get the exact result:

Lo = load 'pivot.txt' using PigStorage (',') as (id:chararray, code:chararray, key:chararray, value:int);
Aa = group L by (code);
Bb = foreach Aa {AUX = foreach Lo generate $0,$2,$3;generate group, AUX;}`

dump Bb:
(A,{(1,p,10)})
(B,{(3,q,20),(3,p,30),(2,q,20)})
(C,{(3,t,60),(3,q,20)})

      

I can't go any further, help is greatly appreciated.

Thanks, Rohith

+3


source to share


1 answer


Pig Script:

input_data = LOAD 'input.csv' USING PigStorage(',') AS (id:int,code:chararray,key:chararray,value:int);
req_stats = FOREACH(GROUP input_data BY (code,key)) GENERATE FLATTEN(group) AS (code,key), SUM(input_data.value) AS value;
req_stats_fmt = FOREACH(GROUP req_stats BY code) GENERATE group AS code, req_stats.(key,value);
DUMP req_stats_fmt;

      

Entrance:



1,A,p,10
2,B,q,20
3,B,p,30
3,B,q,20
3,C,t,60
3,C,q,20

      

Result: DUMP req_stats_fmt

(A,{(p,10)})
(B,{(q,40),(p,30)})
(C,{(t,60),(q,20)})

      

+3


source







All Articles