How to transfer dataset to scala?
2 answers
One liner that I think works in Spark.
val a = List(
List('a', 'b', 'c', 'd'),
List('e', 'f', 'g', 'h'),
List('i', 'j', 'k', 'l'),
List('m', 'n', 'o', 'p')
)
val b = sc.parallize(a,1)
b.flatMap(_.zipWithIndex)
.groupBy(_._2)
.mapValues(_.map(_._1))
.collectAsMap()
.toList
.sortBy(_._1)
.map(_._2)
//> List[Iterable[Char]] = List(
// List(a, e, i, m), List(b, f, j, n), List(c, g, k, o), List(d, h, l, p))
Replace each element of each list with its index, then group by that index. So we have maps 0 -> <list of (elements, index) at that index>
. Convert values ββto list of values ββonly. Then convert the result to a list (via the map with collectAsMap
since it RDD
doesn't have .toList
) so we can sort it by index. Then sort by index and extract (with a different map) only the element values.
+3
source to share