How to transfer dataset to scala?

I want to take Transpose from dataset in scala?

My csv file,

a,b,c,d
e,f,g,h
i,j,k,l
m,n,o,p

      

I need a result like,

a,e,i,m
b,f,j,n
c,g,k,o
d,h,l,p

      

+3


source to share


2 answers


One liner that I think works in Spark.

val a = List(
  List('a', 'b', 'c', 'd'),
  List('e', 'f', 'g', 'h'),
  List('i', 'j', 'k', 'l'),
  List('m', 'n', 'o', 'p')
)
val b = sc.parallize(a,1)

 b.flatMap(_.zipWithIndex)
  .groupBy(_._2)
  .mapValues(_.map(_._1))
  .collectAsMap()
  .toList
  .sortBy(_._1)
  .map(_._2)
//> List[Iterable[Char]] = List(
// List(a, e, i, m), List(b, f, j, n), List(c, g, k, o), List(d, h, l, p))

      



Replace each element of each list with its index, then group by that index. So we have maps 0 -> <list of (elements, index) at that index>

. Convert values ​​to list of values ​​only. Then convert the result to a list (via the map with collectAsMap

since it RDD

doesn't have .toList

) so we can sort it by index. Then sort by index and extract (with a different map) only the element values.

+3


source


Use the method for this transpose

:



val a = List(
  List('a', 'b', 'c', 'd'),
  List('e', 'f', 'g', 'h'),
  List('i', 'j', 'k', 'l'),
  List('m', 'n', 'o', 'p')
)

a.transpose

//List(
//  List(a, e, i, m), 
//  List(b, f, j, n), 
//  List(c, g, k, o), 
//  List(d, h, l, p))

      

+1


source







All Articles