Typical Hadoop setup for a remote job

Question

Typical Hadoop setup for a remote job

So, I'm still a bit new to hadoop and I'm currently in the process of setting up a small test cluster on Amazonaws. So my question is related to some advice on structuring a cluster so that it is possible to submit jobs from remote computers.

I currently have 5 cars. 4 is basically a Hadoop cluster with names, forward, etc. One machine is used as the machine manager (Cloudera Manager). I'm going to describe my thinking process on set, and if anyone can listen to those moments that I don't understand, that would be great.

I wondered what was the best setup for a small cluster. So I decided to expose only one manager's machine and probably use it to send all jobs through it. Other machines will see each other, etc., but will not be accessible from the outside world. I have a conceptual idea on how to do this, but I'm not sure how to do it correctly, although if someone could point me in the right direction that would be great.

Another important point is that I want to be able to submit jobs to the cluster via an open machine from a client machine (maybe Windows). I also don't fully understand this setting. Do I have to have Hadoop installed on the machine in order to use the normal hadoop commands and also write / submit jobs, for example from Eclipse or something similar.

So, to sum it up, my questions are:

This is a common setup for a small test cluster.
How can I use one exposed machine to send / route jobs to the cluster, without any Hadoop nodes on it.
How to configure a client machine to send jobs to a remote cluster and an example of how to do this in Windows. Also if there is any reason not to use Windows as the client machine in this setup.

Thanks, I would really appreciate any advice or help on this.

+3

linux windows hadoop cloudera-cdh cloudera-manager

Artii 19 Aug 14 at 22:18

source to share

1 answer

Narada · Answer 1 · 2018-12-04T08:20:29+0000

Since this has not been answered, I will try to answer it.

1. Rest api for applying:

Resource 1 (Cluster Applications API (app-app)): https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application

Resource 2: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_yarn-resource-management/content/ch_yarn_rest_apis.html

Resource 3: https://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api

Resource 4: Start a MapReduce job through the rest api

2. Submitting hadoop job from the client machine

Resource 1: https://pravinchavan.wordpress.com/2013/06/18/submitting-hadoop-job-from-client-machine/

3. Sending the program to the remote hadoop cluster

You can send a program to a remote Hadoop cluster to run it. All you need to do is make sure you set the resource manager address, fs.defaultFS, library files, and mapreduce.framework.name correctly before running the actual job. Resource 1: ( how to submit a mapreduce job with yarn api in java )

Typical Hadoop setup for a remote job

1. Rest api for applying:

2. Submitting hadoop job from the client machine

3. Sending the program to the remote hadoop cluster

More articles: