Iterate an RDD and match one value to look in another RDD using a key

Actually I have two RDDs with the same structure [(String, (Int, scala.collection.immutable.Map[String,Int], Double))]

rdd1

(A,(1,Map(VVV -> 1),0.0))
(B,(26,Map(DDD -> 2, PPP -> 7, OOO -> 2, EEE -> 3, LLL -> 12),1.35))
(C,(2,Map(VVV -> 2),0.0))

      

rdd2

(OOO,(2,Map(B -> 2),0.0))
(DDD,(2,Map(B -> 2),0.0))
(PPP,(7,Map(B -> 7),0.0))
(LLL,(12,Map(B -> 12),0.0))
(VVV,(3,Map(C -> 2, A -> 1),0.63))
(EEE,(3,Map(B -> 3),0.0))

      

I need an iterator rdd1

and for each map key ((VVV), (DDD, PPP, OOO, EEE, LLL), (VVV))

to search in rdd2

for its key, then a function is called to do the computation.

How can this be done? Is it possible? Iterate the RDD and match one value to look in another RDD using a key.

I tested using:

def calculate(t: String, c: Int, m: scala.collection.immutable.Map[String,Int], e: Double, r: org.apache.spark.rdd.RDD[(String, (Int, scala.collection.immutable.Map[String,Int], Double))]) = {    
    Tuple5(t,c,m,e,r.lookup("DDD"))
}
val newRDD = rdd1.map(f => calculate(f._1, f._2._1, f._2._2, f._2._3, rdd2))

      

And when I execute newRDD.take(10).foreach(println(_))

The following error message appears:

14/11/10 13:30:46 ERROR Executor: Exception in task ID 54 scala.MatchError: null 
    at org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:572)

      

And one more test:

rdd1.foreach(a => { rdd2.foreach(b => { println(b)}) })

      

But it gives me the following error message:

14/11/10 13:35:23 ERROR Executor: Exception in task ID 55 java.lang.NullPointerException
    at org.apache.spark.rdd.RDD.foreach(RDD.scala:715)

      

+3


source to share


1 answer


I would turn your maps into tuples (giving an RDD with one entry for each map entry in the original rdd1

) and then joins:



val splitRdd1: RDD[(String, (String, Int, Int, Double))] =
  rdd1.flatMap {case (s, (i, map, d)) => map.toList.map {
    case (k, v) => (k, (s, i, v, d))
    }
  }
val newRdd = splitRdd1.join(rdd2).map{...}

      

+2


source







All Articles