PIG: sum and division, creating an object

I am writing a pig program that loads a file that separates it from the tabs

ex: TAB name year TAB count TAB ...

file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);

-- Group by type
grouped = GROUP file BY type;

-- Flatten
by_type = FOREACH grouped GENERATE FLATTEN(group) AS (type, year, match_count, volume_count);

group_operat = FOREACH by_type GENERATE  
        SUM(match_count) AS sum_m,
        SUM(volume_count) AS sum_v,
       (float)sum_m/sm_v;

DUMP group_operat;

      

The problem is with the group object I'm trying to create. I want to sum all hit counters, sum all volume counts, and divide hit counters by volume .

What am I doing wrong in my arithmetic / object creation? The error I'm getting is line 7, column 11> swing script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatible schema: left is "type: NULL, year: NULL, match_count : NULL, volume_count: NULL ", on the right -" group: chararray "

Thank.

+3


source to share


2 answers


Try this, it will return the type and amount.

UPDATED working code

input.txt

A       2001     10      2
A       2002     20      3
B       2003     30      4
B       2004     40      1

      



PigScript:

file = LOAD 'input.txt' USING PigStorage() AS (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY type;
group_operat = FOREACH grouped {
                                 sum_m = SUM(file.match_count);
                                 sum_v = SUM(file.volume_count);
                                 GENERATE group,(float)(sum_m/sum_v) as sum_mv;
                                }
DUMP group_operat;

      

Output:

(A,6.0)
(B,14.0)

      

+2


source


try it,

file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);

grouped = GROUP file BY (type,year);

group_operat = FOREACH grouped GENERATE group,
        SUM(file.match_count) AS sum_m,
        SUM(file.volume_count) AS sum_v,
       (float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;

      



Above script list the results group by type and year if you only want the group by type, then remove from the grouped ones

grouped = GROUP file BY type;

group_operat = FOREACH grouped GENERATE group,file.year,
        SUM(file.match_count) AS sum_m,
        SUM(file.volume_count) AS sum_v,
       (float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;

      

+1


source







All Articles