Scala Sparks connect to remote cluster
I want to connect to a remote cluster and run Spark process. So from what I've read, this is stated in SparkConf.
val conf = new SparkConf()
.setAppName("MyAppName")
.setMaster("spark://my_ip:7077")
Where my_ip is the IP address of my cluster. Sorry, I am getting a connection refused. So my guess is that some credentials need to be added in order to connect correctly. How can I provide credentials? It looks like it would be done with .set (key, value), but there is no indication on that.
+3
source to share
2 answers
Two things are missing:
- The cluster manager must be set to
yarn
(setMaster ("yarn")) and the deployment mode must becluster
, your current setting is used to run Spark offline. More details here: http://spark.apache.org/docs/latest/configuration.html#application-properties - Also, you need to get the files to
yarn-site.xml
andcore-site.xml
from the cluster and put them inHADOOP_CONF_DIR
so that Spark can select yarn options like the IP of your node host. More information: http://theckang.com/2015/remote-spark-jobs-on-yarn/
By the way, this will work if you use spark-submit
to send a task that is programmatically more complex to achieve it and can only use a mode yarn-client
that is difficult to set remotely.
+2
source to share
- To launch an application into yarn using spark, you must use t
--master yarn
for your commandspark-submi
orsetMaster("yarn")
when initializing the application configuration. - If the
"spark-submit"
popuar Java Secure Channel (JSCH) can be used to send a command from a remote host , of course the environment setting must be properly configured on the cluster
0
source to share