Why can I map RDD to SparkContext

SparkContext is not serializable. It is meant to be used by the driver, so can anyone explain the following?

Using Spark Sheath, On Yarn and Spark Version 1.6.0

val rdd = sc.parallelize(Seq(1))
rdd.foreach(x => print(sc))

      

Nothing happens on the client (prints the executing parties)

Using Spark Shell, Local Wizard, and Spark 1.6.0

val rdd = sc.parallelize(Seq(1))
rdd.foreach(x => print(sc))

      

Prints out "null" on the client

Using pyspark, local wizard and Spark version 1.6.0

rdd = sc.parallelize([1])
def _print(x):
    print(x)
rdd.foreach(lambda x: _print(sc))

      

Throws an exception

I've also tried the following:

Using spark shell and Spark version 1.6.0

class Test(val sc:org.apache.spark.SparkContext) extends Serializable{}
val test = new Test(sc)
rdd.foreach(x => print(test))

      

Now it finally throws java.io.NotSerializableException: org.apache.spark.SparkContext


Why does it work in Scala when I only print sc? Why do I have a null reference when it should have thrown a NotSerializableException (or so I thought ...)

+3


source to share





All Articles