RDD.sortByKey using a function in python?

Let's say my key is not a simple data type but a class and I need to sort the keys using a compare function. In Scala, I can do this using new Ordering

. How can I achieve the same functionality in Python? For example, what would be the equivalent code in Python?

implicit val someClassOrdering = new Ordering[SomeClass] {
        override def compare(a: SomeClass, b: SomeClass) = a.compare(b)
    }

      

+3


source to share


3 answers


In Python, you can create class methods for comparison by a rich comparison methods :
__lt__

, __le__

, __eq__

, __ne__

, __gt__

,__ge__

You can do these methods for anything you need to compare instances of your class, even weird things, but it's a good idea to make them consistent if you want the sort to behave intelligently. :)



Here's a fairly simple example of how they are used in this answer I wrote a month ago: Sorting a list to form the largest possible number .

Here's another nice example from Finding a Partial Match in a List of Tuples that creates a lookup object.

+2


source


You can pass an argument keyfunc

:

from numpy.random import seed, randint
from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])

seed(1)
rdd = sc.parallelize(
    (Point(randint(10), randint(10)), randint(100)) for _ in range(5))

      

Now, let's say you want to sort the points by the y coordinate:



rdd.sortByKey(keyfunc=lambda p: p.y).collect()

      

and the result is:

[(Point(x=5, y=0), 16),
 (Point(x=9, y=2), 20),
 (Point(x=5, y=2), 84),
 (Point(x=1, y=7), 6),
 (Point(x=5, y=8), 9)]

      

+3


source


In Python, you can use a key

method argument sort()

. There was also a feature cmp

, but this solution was not optimal and is now deprecated (or even removed, depending on the Python version). Take a look here .

0


source







All Articles