Spark spark.shuffle.memoryFraction has no effect

I am testing Spark on Amazon EMR using Python and the basic wordcount example that comes with Spark.

After running the application, I realized that in Stage 0 reduceByKey (add) roughly 2.5 GB is shuffled into memory, and 4 GB is spilled onto disk. Since in the wordcount example I am not caching or storing any data, so I thought I could improve the performance of this application by providing more random memory. So, in spark-defaults.conf, I added the following:

spark.storage.memoryFraction    0.2
spark.shuffle.memoryFraction    0.6

      

However, I still get the same performance, with the same amount of random data spilled on disk and memory. I have confirmed that Spark is reading these configurations using Spark UI / Environment and I can see my changes. Moreover, I tried to set spark.shuffle.spill

to false and I got the view I am looking for and all the data in random order was only passed into memory.

So, what am I wrong here, and why is the extra random share of memory not being used?

My environment:
Amazon EMR with Spark 1.3.1 works using the -x argument
1 Master node: m3.xlarge
3 Master nodes: m3.xlarge
Application: wordcount.py
Input: 10.gz files 90MB each (~ 350MB unarchived) stored in S3
Send command:

/home/hadoop/spark/bin/spark-submit --deploy-mode client /mnt/wordcount.py s3n://<input location>

      

spark-defaults.conf:

spark.eventLog.enabled          false
spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70
spark.driver.extraJavaOptions   -Dspark.driver.log.level=INFO
spark.master                    yarn
spark.executor.instances        3
spark.executor.cores            4
spark.executor.memory           9404M
spark.default.parallelism       12
spark.eventLog.enabled          true
spark.eventLog.dir              hdfs:///spark-logs/
spark.storage.memoryFraction    0.2
spark.shuffle.memoryFraction    0.6

      

+3


source to share





All Articles