Why doesn't Spark Standalone cluster use all available cores?

Question

Why doesn't Spark Standalone cluster use all available cores?

I have done the following configuration for Apache Spark 1.2.1 Standalone Cluster:

Hadoop 2.6.0
2 nodes - one master and one slave - in a stand-alone cluster
3- node Cassandra
total cores: 6 (2 masters, 4 slaves)
total memory: 13 GB

I am running Spark in offline cluster manager as:

./spark-submit --class com.b2b.processor.ProcessSampleJSONFileUpdate \
               --conf num-executors=2 \
               --executor-memory 2g \
               --driver-memory 3g \
               --deploy-mode cluster \
               --supervise \
               --master spark://abc.xyz.net:7077 \ 
               hdfs://abc:9000/b2b/b2bloader-1.0.jar ds6_2000/*.json

My work is doing well, i.e. reads data from files and inserts it into Cassandra.

Spark documentation says that a standalone cluster uses all available cores, but my cluster only uses 1 core per application. Also, after running an application on Spark UI, it shows Applications: 0 and Drivers: 1 running.

My request:

Why doesn't it use all available 6 cores?
Why customize a UI showing apps: 0 Launch?

Code:

public static void main(String[] args) throws Exception {

  String fileName = args[0];
  System.out.println("----->Filename : "+fileName);        

  Long now = new Date().getTime();

  SparkConf conf = new SparkConf(true)
           .setMaster("local")
           .setAppName("JavaSparkSQL_" +now)
           .set("spark.executor.memory", "1g")
           .set("spark.cassandra.connection.host", "192.168.1.65")
           .set("spark.cassandra.connection.native.port", "9042")
           .set("spark.cassandra.connection.rpc.port", "9160");

  JavaSparkContext ctx = new JavaSparkContext(conf);

  JavaRDD<String> input =  ctx.textFile("hdfs://abc.xyz.net:9000/dataLoad/resources/" + fileName,6);
  JavaRDD<DataInput> result = input.mapPartitions(new ParseJson()).filter(new FilterLogic());

  System.out.print("Count --> "+result.count());
  System.out.println(StringUtils.join(result.collect(), ","));

  javaFunctions(result).writerBuilder("ks","pt_DataInput",mapToRow(DataInput.class)).saveToCassandra();

}

+3

java apache-spark

Abhinandan satpute Apr 22. '15 at 9:39

source to share

3 answers

What happens is that you thought you were using offline mode, which defaults to all available nodes, but in reality you were using local mode with "local" as the master. In local mode, even if you install local [*], Spark will always use only 1 core, since local mode is a non-distributed deployment mode with one JVM. This is why when you changed your main parameter to "spark: //abc.xyz.net: 7077" everything went as you expected.

+2

User2130 Jul 14 16 at 22:14

source to share

Try setting the wizard as local [*], this will use all kernels.

0

None Apr 22. 15 at 14:37

source to share

eliasah · Accepted Answer · 2015-04-23T05:50:51+0000

If you set your host in your app to local (via .setMaster("local")

), it won't connect to spark://abc.xyz.net:7077

.

You do not need to install the wizard in the application if you configure it using the command spark-submit

.

Why doesn't Spark Standalone cluster use all available cores?

More articles: