Ranking in Apache Pig

Is there a good way to do column ranking in Apache Pig after you've sorted it? It would be even better if rating was about connections.

A = LOAD 'file.txt' as (score:int, name:chararray);
B = foreach A generate score, name order by score;
....

      

+3


source to share


4 answers


Try Ranga operation

A = load 'data' AS (f1:chararray,f2:int,f3:chararray);

DUMP A;
(David,1,N)
(Tete,2,N)
B = rank A;

dump B;
(1,David,1,N)
(2,Tete,2,N)

      



Link https://blogs.apache.org/pig/entry/apache_pig_it_goes_to

+2


source


I think you could use the "ORDER BY" statement. And here is the link

B = ORDER A BY score DESC;

      



or

B = ORDER A BY score ASC;

      

0


source


You must use a combination of both solutions

B = ORDER A BY score DESC;
C = rank B;

      

Let's say you want the second largest

D = filter C by $0 == 2;

      

0


source


You can use Rank in PIG and it will handle bindings as well, but when using rank, only one reducer will be used, so performance will be affected.

0


source







All Articles