Scala Sparks connect to remote cluster

I want to connect to a remote cluster and run Spark process. So from what I've read, this is stated in SparkConf.

 val conf = new SparkConf()
  .setAppName("MyAppName")
  .setMaster("spark://my_ip:7077")

      

Where my_ip is the IP address of my cluster. Sorry, I am getting a connection refused. So my guess is that some credentials need to be added in order to connect correctly. How can I provide credentials? It looks like it would be done with .set (key, value), but there is no indication on that.

+3


source to share


2 answers


Two things are missing:

  • The cluster manager must be set to yarn

    (setMaster ("yarn")) and the deployment mode must be cluster

    , your current setting is used to run Spark offline. More details here: http://spark.apache.org/docs/latest/configuration.html#application-properties
  • Also, you need to get the files to yarn-site.xml

    and core-site.xml

    from the cluster and put them in HADOOP_CONF_DIR

    so that Spark can select yarn options like the IP of your node host. More information: http://theckang.com/2015/remote-spark-jobs-on-yarn/


By the way, this will work if you use spark-submit

to send a task that is programmatically more complex to achieve it and can only use a mode yarn-client

that is difficult to set remotely.

+2


source


  • To launch an application into yarn using spark, you must use t --master yarn

    for your command spark-submi

    or setMaster("yarn")

    when initializing the application configuration.
  • If the "spark-submit"

    popuar Java Secure Channel (JSCH) can be used to send a command from a remote host , of course the environment setting must be properly configured on the cluster


0


source







All Articles