Ranking in Apache Pig
Is there a good way to do column ranking in Apache Pig after you've sorted it? It would be even better if rating was about connections.
A = LOAD 'file.txt' as (score:int, name:chararray);
B = foreach A generate score, name order by score;
....
+3
Winter
source
to share
4 answers
Try Ranga operation
A = load 'data' AS (f1:chararray,f2:int,f3:chararray);
DUMP A;
(David,1,N)
(Tete,2,N)
B = rank A;
dump B;
(1,David,1,N)
(2,Tete,2,N)
Link https://blogs.apache.org/pig/entry/apache_pig_it_goes_to
+2
Krishna kalyan
source
to share
I think you could use the "ORDER BY" statement. And here is the link
B = ORDER A BY score DESC;
or
B = ORDER A BY score ASC;
0
sheimi
source
to share
You must use a combination of both solutions
B = ORDER A BY score DESC;
C = rank B;
Let's say you want the second largest
D = filter C by $0 == 2;
0
Krishna kalyan
source
to share
You can use Rank in PIG and it will handle bindings as well, but when using rank, only one reducer will be used, so performance will be affected.
0
Narendra parmar
source
to share