How to install Apache Spark on HortonWorks HDP 2.2 (built using Ambari)

I have successfully built a 5 node HortonWorks HDP 2.2 cluster using Ambari.

However, I don't see Apache Spark in the list of installed services.

I did some research and found that Ambari does not install certain components like tint etc. (Spark wasn't on this list, but I guess it wasn't installed.)

How do I manually install Apache spark on my 5 node HDP 2.2?

Or do I need to remove the cluster and do a fresh install without using Ambari?

+3


source to share


3 answers


Hortonworks support for Spark is arriving but incomplete ( details and blog ).



Instructions for integrating Spark with HDP can be found here .

+1


source


You can create your own Ambari stack for Spark. I recently did just that, but I cannot share this code :(

What I can do is share a tutorial I did on how to make any stack for Ambari, including Spark. There are many interesting problems with Spark that need to be solved and are not covered in the tutorial. It will help anyway. http://bit.ly/1HDBgS6



There is also a guide here from the Ambari people: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38571133 .

+1


source


1) Ambari 1.7x does not install Accumulo, Hue, Ranger or Solr services for HDP 2.2 stack. To install Accumulo, Hue, Knox, Ranger and Solr services, install HDP manually.


2) Apache Spark 1.2.0 on YARN with HDP 2.2: here .


3) Spark and Hadoop: Working together :

Standalone deployment: With standalone deployment, you can statically allocate resources to all or a subset of machines in a Hadoop cluster and run Spark alongside Hadoop MR. The user can then run arbitrary Spark jobs on their HDFS data. Its simplicity makes it the choice for many Hadoop 1.x users.

Hadoop Yarn Deployment: Hadoop users who have already deployed or plan to deploy Hadoop Yarn can simply run Spark on YARN without the need for pre-installation or administrative access. This allows users to easily integrate Spark into their Hadoop stack and take full advantage of Spark and other components running on top of Spark.

Spark in MapReduce: For Hadoop users who are not yet running YARN, another option besides standalone deployment is to use SIMR to run Spark jobs inside MapReduce. With SIMR, users can start experimenting with Spark and use their skin within minutes of loading it! This significantly lowers the barrier to deployment and allows almost everyone to play with Spark.


0


source







All Articles