How do I set spark.akka.frameSize in spark shell?
For a particular spark shell session , I am trying
spark-shell -Dspark.akka.frameSize=10000 --executor-memory 4g
Inside the shell, I get this:
System.getProperty("spark.executor.memory")
res0: String = 4g
System.getProperty("spark.akka.frameSize")
res1: String = null
Maybe this line is wrong, but I am getting a frameSize error when I try to take () on my dataset.
org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 6:0 was 12518780 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large values.
Shows the default frameSize 10M. Maybe I have wrong syntax. Please help. Thank!
source to share
This is described in the Spark configuration guide under Dynamically Load Spark Properties :
Spark and the tool
spark-submit
support two ways to dynamically load configurations. The first are command line parameters such as the--master
one shown above.spark-submit
can accept any Spark property using a flag--conf
, but uses special flags for properties that come into play when the Spark application starts.
For example:
./bin/spark-submit --name "My app" --master local[4] --conf spark.akka.frameSize=100 --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar
source to share
This syntax works in a spark shell:
spark-shell --executor-memory 4g --driver-java-options "-Dspark.akka.frameSize=100"
This was terribly unclear in the Spark documentation. It is clear that this still requires a lot of work.
This was in 1.0.1. Looks like Josh's answer below for 1.1.0+
source to share