Spark Mesos Dispatcher

My team is deploying a new Big Data architecture to Amazon Cloud. We have Mesos and Spark are running.

We submit Spark jobs (i.e. jars) from a bastion host within the same cluster. However, the bastion host is a driver program and this is called client mode (if I understood correctly).

We would like to try cluster mode , but we are not sure where to start the dispatcher process.

The documentation says to run it on a cluster, but I'm confused as our masters don't have Spark and we are using Zookeeper for master selections. Running it on the slave node is not a vailable option as the slave might fail and we don't want to expose the slave IP or public DNS to host bastion.

Is it correct to run dispatcher on host bastion?

Many thanks

+1


source to share


3 answers


The documentation is not very detailed. However, we are quite happy with what we found: according to the documentation , cluster mode is not supported for Mesos clusters (and for Python applications).

However, we started the dispatcher using --master mesos://zk://...

To send attachments, you need the following:



spark-submit --deploy-mode cluster <other options> --master mesos://<dispatcher_ip>:7077 <ClassName> <jar>

      

If you run this command from a bastion machine it will not work, because the Mesos master will look for a spare can along the same path as the bastion. We've finished publishing the file as a downloadable URL.

Hope it helps

+5


source


I have not used cluster mode in Mesos, and the description for cluster mode is not very detailed. There is no option in the script --help

, as it should be, IMHO. However, if you don't pass an argument --master

, it throws a help message and it turns out there is an option --zk

to specify the Zookeeper url.



What can be done to run this script on the bastion itself with the appropriate options --master

and --zk

. Will this work for you?

0


source


You can use the sparklit docker image and your application.jar instead of loading the jar in s3. I haven't tried it yet, but I think it should work. Environment variable SPARK_DIST_CLASSPATH

in spark-env.sh

. I am using spark distribution compiled without hadoop with apache hadoop 2.7.1

export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath):/opt/hadoop/share/hadoop/tools/lib/*:/opt/application.jar

      

0


source







All Articles