Spark: merge sorted sections
I would like to merge the sorted sections locally (per driver).
I did .mapPartitionsToPair()
in my data that I created Iterable<Tuple2<D,X>>
where D
is a type that has ordering (lets say some kind of date) and X
is a type with some merge rules. The result is ordered D
unambiguously.
I need my end result as a reduction of these sections, also ordered D
unambiguously. Is there a local reduction that counts on the input to be ordered by the key? Can I use any other approach to achieve my goal?
I am using Spark 1.1.0.
source to share
The simplest solution is sortByKey()
and then collect()
. It doesn't use the already sorted data property, but the sort is scalable and fast and it's easy to do.
But if you really want to rely on an already sorted property, use glom()
and then collect()
to get a list of sections. Then combine the sorted lists with, for example, Guava Iterators.mergeSorted () .
source to share