Why can I map RDD to SparkContext
SparkContext is not serializable. It is meant to be used by the driver, so can anyone explain the following?
Using Spark Sheath, On Yarn and Spark Version 1.6.0
val rdd = sc.parallelize(Seq(1))
rdd.foreach(x => print(sc))
Nothing happens on the client (prints the executing parties)
Using Spark Shell, Local Wizard, and Spark 1.6.0
val rdd = sc.parallelize(Seq(1))
rdd.foreach(x => print(sc))
Prints out "null" on the client
Using pyspark, local wizard and Spark version 1.6.0
rdd = sc.parallelize([1])
def _print(x):
print(x)
rdd.foreach(lambda x: _print(sc))
Throws an exception
I've also tried the following:
Using spark shell and Spark version 1.6.0
class Test(val sc:org.apache.spark.SparkContext) extends Serializable{}
val test = new Test(sc)
rdd.foreach(x => print(test))
Now it finally throws java.io.NotSerializableException: org.apache.spark.SparkContext
Why does it work in Scala when I only print sc? Why do I have a null reference when it should have thrown a NotSerializableException (or so I thought ...)
+3
source to share
No one has answered this question yet
Check out similar questions:
0