How do I set up a clustered environment for Spark applications on Windows machines?

I am developing in pyspark with intrinsically safe non-clustered mode. These days I would like to know more about the spark cluster mode. I searched the web and found that I might need a cluster manager to run clusters on different machines using Apache Mesos or Spark Standalone. But I could not easily find the details of the picture.

How do I configure in terms of system design to run spark clusters on multiple windows machines (or multiple vms windows).

+3


source to share


1 answer


You might want to explore (from the most basic) Spark Standalone, through Hadoop YARN to Apache Mesos or DC / OS. See Cluster Mode Overview .

I would recommend using Spark Standalone first (as the easiest option for submitting Spark apps). Spark Standalone is included with any Spark installation and works great on Windows. The problem is that there are no scripts to start and stop the offline wizards and workers (aka slaves) for Windows OS. You just need to "code" them yourself.

Use the following to run the stand-alone wizard on Windows:

// terminal 1
bin\spark-class org.apache.spark.deploy.master.Master

      

Note that after running the standalone wizard, you don't get a login, but don't worry and navigate to http: // localhost: 8080 / to see the web interface of the Spark standalone cluster.

A separate worker instance is launched in a separate terminal.

// terminal 2
bin\spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077

      

With a single worker Spark Standalone cluster, you can dispatch Spark applications like this:



spark-submit --master spark://localhost:7077 ...

      

Read Sparks Offline in the official Spark documentation.


Once I found out Mesos is not an option given its System Requirements :

Mesos runs Linux (64 bit) and Mac OS X (64 bit).

However, you could run any of the clusters using virtual machines using VirtualBox or similar. At least DC / OS has dcos-vagrant , which should make it pretty easy:

dcos-vagrant Quickly provide a DC / OS cluster on your local machine for development, testing, or demo.

Deploying DC / OS Vagrant involves creating a local cluster of VirtualBox VMs using the base dcos-vagrant-box image, and then installing DC / OS.

+5


source







All Articles