How do I set spark.akka.frameSize in spark shell?

Question

How do I set spark.akka.frameSize in spark shell?

For a particular spark shell session , I am trying

spark-shell -Dspark.akka.frameSize=10000 --executor-memory 4g

Inside the shell, I get this:

System.getProperty("spark.executor.memory")
res0: String = 4g
System.getProperty("spark.akka.frameSize")
res1: String = null

Maybe this line is wrong, but I am getting a frameSize error when I try to take () on my dataset.

org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 6:0 was 12518780 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large values.

Shows the default frameSize 10M. Maybe I have wrong syntax. Please help. Thank!

+3

apache-spark

Brian dolan 30 nov. 14 at 16:16

source to share

2 answers

This syntax works in a spark shell:

spark-shell  --executor-memory 4g --driver-java-options "-Dspark.akka.frameSize=100"

This was terribly unclear in the Spark documentation. It is clear that this still requires a lot of work.

This was in 1.0.1. Looks like Josh's answer below for 1.1.0+

+2

Brian dolan 30 nov. 14 at 17:54

source to share

Josh rosen · Accepted Answer · 2014-11-30T21:59:48+0000

This is described in the Spark configuration guide under Dynamically Load Spark Properties :

Spark and the tool spark-submit

support two ways to dynamically load configurations. The first are command line parameters such as the --master

one shown above. spark-submit

can accept any Spark property using a flag --conf

, but uses special flags for properties that come into play when the Spark application starts.

For example:

./bin/spark-submit --name "My app" --master local[4] --conf spark.akka.frameSize=100 --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

How do I set spark.akka.frameSize in spark shell?

More articles: