Spark RDD maps one row of data to multiple rows

I have a text file with data that looks like this:

Type1 1 3 5 9
Type2 4 6 7 8
Type3 3 6 9 10 11 25

      

I would like to convert it to RDD with lines like this:

1 Type1
3 Type1
3 Type3
......

      

I started with a case class:

MyData[uid : Int, gid : String]

      

New to spark and scala and I can't seem to find an example that does this.

+3


source to share


1 answer


You seem to want something like this?



rdd.flatMap(line=>{
  val splitLine = line.split(' ').toList
  splitLine match{
    case (gid:String) :: rest => rest.map(x:String =>MyData(x.toInt, gid))
  }
}

      

+4


source







All Articles