Why can I map RDD to SparkContext

Question

Why can I map RDD to SparkContext

SparkContext is not serializable. It is meant to be used by the driver, so can anyone explain the following?

Using Spark Sheath, On Yarn and Spark Version 1.6.0

val rdd = sc.parallelize(Seq(1))
rdd.foreach(x => print(sc))

Nothing happens on the client (prints the executing parties)

Using Spark Shell, Local Wizard, and Spark 1.6.0

val rdd = sc.parallelize(Seq(1))
rdd.foreach(x => print(sc))

Prints out "null" on the client

Using pyspark, local wizard and Spark version 1.6.0

rdd = sc.parallelize([1])
def _print(x):
    print(x)
rdd.foreach(lambda x: _print(sc))

Throws an exception

I've also tried the following:

Using spark shell and Spark version 1.6.0

class Test(val sc:org.apache.spark.SparkContext) extends Serializable{}
val test = new Test(sc)
rdd.foreach(x => print(test))

Now it finally throws java.io.NotSerializableException: org.apache.spark.SparkContext

Why does it work in Scala when I only print sc? Why do I have a null reference when it should have thrown a NotSerializableException (or so I thought ...)

+3

serialization apache-spark rdd

CARREAU Clément 04 jul. 17 at 16:13

source to share

No one has answered this question yet

Check out similar questions:

2727

What is serialVersionUID and why should I use it?

25

What is Yarn Client Mode in Spark?

2

Kafka directstream dstream card does not print

1

The spark couldn't filter properly?

1

Exhaustive spark computation: creating an RDD in mapFunction

1

Spark Docker - Cannot Access Web UI for Resource Manager - Mac PC

0

Confused

0

How to specify spark version in Execution Launcher

0

the object is not serializable org.apache.spark.SparkContext

-1

Local file upload error in spark app

Why can I map RDD to SparkContext

More articles: