Conditional statements of pigs
I think I already know the answer to this question, but I just wanted to check here before I give up and do something ugly.
I have a query that is supposed to count the total number of clicks as well as the total number of users. All clicks will only be this code without a clear one:
report = FOREACH report GENERATE user, genre, title;
report = DISTINCT report;
report = GROUP report BY (genre, title);
My question is essentially: is there a way to write a conditional statement that would skip the DISTINCT step in the process? Pseudo:
report = FOREACH report GENERATE user, genre, title;
if $report_type == 'users':
report = DISTINCT report;
end if
report = GROUP report BY (genre, title);
I would rather not have two separate files, and up to this point the only solutions I can find involve using a Python shell, etc., to deal with it dynamically. I'd rather save everything in a simple .pig file, but can't seem to find a way to do this.
source to share
One option is you can try something like this. Can you check your entry?
input:
user1,action,aa user2,comedy,cc user3,drama,dd user1,action,aa user1,action,aa user2,comedy,cc
PigScript:
A = LOAD 'input' USING PigStorage(',') AS (user, genre, title);
B = FOREACH A GENERATE user, genre, title;
C = GROUP B BY (genre, title);
D = FOREACH C {
noDistValue = FOREACH B GENERATE user,genre,title;
distValue = DISTINCT B;
GENERATE $0 AS grp,noDistValue,distValue;
}
E = FOREACH D GENERATE grp,(('$report_type' == 'users')?distValue:noDistValue) AS mybag;
DUMP E;
Output1:
→ pig -x local -param "report_type = users" test.pig
((action,aa),{(user1,action,aa)}) ((comedy,cc),{(user2,comedy,cc)}) ((drama,dd),{(user3,drama,dd)})
Output2:
→ pig -x local -param "report_type = nonusers" test.pig
((action,aa),{(user1,action,aa),(user1,action,aa),(user1,action,aa)}) ((comedy,cc),{(user2,comedy,cc),(user2,comedy,cc)}) ((drama,dd),{(user3,drama,dd)})
If you want to compute the graph then project the relation E and also you can modify the above script to suit your needs.
source to share