Why doesn't Spark Standalone cluster use all available cores?

I have done the following configuration for Apache Spark 1.2.1 Standalone Cluster:

  • Hadoop 2.6.0
  • 2 nodes - one master and one slave - in a stand-alone cluster
  • 3- node Cassandra
  • total cores: 6 (2 masters, 4 slaves)
  • total memory: 13 GB

I am running Spark in offline cluster manager as:

./spark-submit --class com.b2b.processor.ProcessSampleJSONFileUpdate \
               --conf num-executors=2 \
               --executor-memory 2g \
               --driver-memory 3g \
               --deploy-mode cluster \
               --supervise \
               --master spark://abc.xyz.net:7077 \ 
               hdfs://abc:9000/b2b/b2bloader-1.0.jar ds6_2000/*.json 


My work is doing well, i.e. reads data from files and inserts it into Cassandra.

Spark documentation says that a standalone cluster uses all available cores, but my cluster only uses 1 core per application. Also, after running an application on Spark UI, it shows Applications: 0 and Drivers: 1 running.

My request:

  • Why doesn't it use all available 6 cores?
  • Why customize a UI showing apps: 0 Launch?


public static void main(String[] args) throws Exception {

  String fileName = args[0];
  System.out.println("----->Filename : "+fileName);        

  Long now = new Date().getTime();

  SparkConf conf = new SparkConf(true)
           .setAppName("JavaSparkSQL_" +now)
           .set("spark.executor.memory", "1g")
           .set("spark.cassandra.connection.host", "")
           .set("spark.cassandra.connection.native.port", "9042")
           .set("spark.cassandra.connection.rpc.port", "9160");

  JavaSparkContext ctx = new JavaSparkContext(conf);

  JavaRDD<String> input =  ctx.textFile("hdfs://abc.xyz.net:9000/dataLoad/resources/" + fileName,6);
  JavaRDD<DataInput> result = input.mapPartitions(new ParseJson()).filter(new FilterLogic());

  System.out.print("Count --> "+result.count());
  System.out.println(StringUtils.join(result.collect(), ","));





source to share

3 answers

If you set your host in your app to local (via .setMaster("local")

), it won't connect to spark://abc.xyz.net:7077


You do not need to install the wizard in the application if you configure it using the command spark-submit




What happens is that you thought you were using offline mode, which defaults to all available nodes, but in reality you were using local mode with "local" as the master. In local mode, even if you install local [*], Spark will always use only 1 core, since local mode is a non-distributed deployment mode with one JVM. This is why when you changed your main parameter to "spark: //abc.xyz.net: 7077" everything went as you expected.



Try setting the wizard as local [*], this will use all kernels.



All Articles