Ranking in Apache Pig

Is there a good way to do column ranking in Apache Pig after you've sorted it? It would be even better if rating was about connections.

A = LOAD 'file.txt' as (score:int, name:chararray);
B = foreach A generate score, name order by score;
....

      

+3
apache-pig


source to share


4 answers


Try Ranga operation

A = load 'data' AS (f1:chararray,f2:int,f3:chararray);

DUMP A;
(David,1,N)
(Tete,2,N)
B = rank A;

dump B;
(1,David,1,N)
(2,Tete,2,N)

      



Link https://blogs.apache.org/pig/entry/apache_pig_it_goes_to

+2


source to share


I think you could use the "ORDER BY" statement. And here is the link

B = ORDER A BY score DESC;

      



or

B = ORDER A BY score ASC;

      

0


source to share


You must use a combination of both solutions

B = ORDER A BY score DESC;
C = rank B;

      

Let's say you want the second largest

D = filter C by $0 == 2;

      

0


source to share


You can use Rank in PIG and it will handle bindings as well, but when using rank, only one reducer will be used, so performance will be affected.

0


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics