How can I pass an argument to a custom function for mapPartitions in Spark?
In Spark, you can use a user-defined function for mapPartitions
. Now my question is, how can I pass an argument to it. For example, at the moment I have something like this called with rdd.mapPartitions(userdefinedFunc)
.
def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])]) : Iterator[(Long, Long)] =
{
val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]
// Code here
res.iterator
}
However, I also want a constant as an argument to this UDF, so for example it looks like this.
def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])], someConstant: Long) :
Iterator[(Long, Long)] =
{
val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]
// Code here
res.iterator
}
Now how can I call this function with mapPartitions
. I get an error if I just use rdd.mapPartitions(userdefinedFunc(someConstant))
.
source to share
Use currying function like:
def userdefinedFunc(someConstant: Long)(iter: Iterator[(Long, Array[SAMRecord])]): Iterator[(Long, Long)]
Then there userdefinedFunc(someConstant)
will be a function with a type (iter: Iterator[(Long, Array[SAMRecord])]) => Iterator[(Long, Long)]
that you can pass to mapPartitions.
source to share