Accessing a key from mapValues โ€‹โ€‹or flatMapValues?

In Spark 1.3, does key access exist from mapValues

?

In particular, if I have

val y = x.groupBy(someKey)
val z = y.mapValues(someFun)

      

can someFun

find out which key y it is currently working on?

Or do I need to do

val y = x.map(r => (someKey(r), r)).groupBy(_._1)
val z = y.mapValues{ case (k, r) => someFun(r, k) }

      

Note. The reason I want to use mapValues

and not map

is to keep the section.

+3


source to share


3 answers


You cannot use a key with mapValues

. But you can keep the separation with mapPartitions

.

val pairs: Rdd[(Int, Int)] = ???
pairs.mapPartitions({ it =>
  it.map { case (k, v) =>
    // your code
  }
}, preservesPartitioning = true)

      



Be careful to actually save the partitioning, the compiler won't be able to check it.

+2


source


In this case, you can use mapPartitions

with the attribute preservesPartitioning

.

x.mapPartitions((it => it.map { case (k,rr) => (k, someFun(rr, k)) }), preservesPartitioning = true)

      



You just need to make sure you are not changing the section i.e. do not change the key.

+7


source


You can use zipWithIndex (). map (lambda x: (x [1], x [0])). mapValues โ€‹โ€‹() after groupByKey () is executed. It will give you a (key, value) pair in the mapValues โ€‹โ€‹function.

0


source







All Articles