How can I pass an argument to a custom function for mapPartitions in Spark?

In Spark, you can use a user-defined function for mapPartitions

. Now my question is, how can I pass an argument to it. For example, at the moment I have something like this called with rdd.mapPartitions(userdefinedFunc)

.

def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])]) : Iterator[(Long, Long)] = 
{
  val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]

  // Code here

  res.iterator
}

      

However, I also want a constant as an argument to this UDF, so for example it looks like this.

def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])], someConstant: Long) : 
 Iterator[(Long, Long)] = 
{
  val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]

  // Code here

  res.iterator
}

      

Now how can I call this function with mapPartitions

. I get an error if I just use rdd.mapPartitions(userdefinedFunc(someConstant))

.

+3


source to share


1 answer


Use currying function like:

def userdefinedFunc(someConstant: Long)(iter: Iterator[(Long, Array[SAMRecord])]): Iterator[(Long, Long)]

      



Then there userdefinedFunc(someConstant)

will be a function with a type (iter: Iterator[(Long, Array[SAMRecord])]) => Iterator[(Long, Long)]

that you can pass to mapPartitions.

+3


source







All Articles