Conditional statements of pigs

I think I already know the answer to this question, but I just wanted to check here before I give up and do something ugly.

I have a query that is supposed to count the total number of clicks as well as the total number of users. All clicks will only be this code without a clear one:

report              = FOREACH report GENERATE user, genre, title;
report              = DISTINCT report;
report              = GROUP report BY (genre, title);

      

My question is essentially: is there a way to write a conditional statement that would skip the DISTINCT step in the process? Pseudo:

report              = FOREACH report GENERATE user, genre, title;
if $report_type == 'users':
    report              = DISTINCT report;
end if
report              = GROUP report BY (genre, title);

      

I would rather not have two separate files, and up to this point the only solutions I can find involve using a Python shell, etc., to deal with it dynamically. I'd rather save everything in a simple .pig file, but can't seem to find a way to do this.

+3


source to share


1 answer


One option is you can try something like this. Can you check your entry?

input:

user1,action,aa
user2,comedy,cc
user3,drama,dd
user1,action,aa
user1,action,aa
user2,comedy,cc

      

PigScript:

A = LOAD 'input' USING PigStorage(',') AS (user, genre, title);
B = FOREACH A GENERATE user, genre, title;
C = GROUP B BY (genre, title);
D = FOREACH C {
                noDistValue = FOREACH B GENERATE user,genre,title;
                distValue =  DISTINCT B;
                GENERATE $0 AS grp,noDistValue,distValue;
              }
E = FOREACH D GENERATE grp,(('$report_type' == 'users')?distValue:noDistValue) AS mybag;
DUMP E;

      



Output1:
→ pig -x local -param "report_type = users" test.pig

((action,aa),{(user1,action,aa)})
((comedy,cc),{(user2,comedy,cc)})
((drama,dd),{(user3,drama,dd)})

      

Output2:
→ pig -x local -param "report_type = nonusers" test.pig

((action,aa),{(user1,action,aa),(user1,action,aa),(user1,action,aa)})
((comedy,cc),{(user2,comedy,cc),(user2,comedy,cc)})
((drama,dd),{(user3,drama,dd)})

      

If you want to compute the graph then project the relation E and also you can modify the above script to suit your needs.

+2


source







All Articles