Map Reduce slot definition

I am on my way to becoming a Hudo cloudera admin. I have heard a lot from the beginning about compute slots per machine in a Hadoop cluster, such as determining the number of card slots and reduced slots.

I searched the internet at the time of registration for a Noob definition for a card minus slot but couldn't find it.

I got really pissed off looking at the PDF explaining the Map Reduce configuration.

Please explain what exactly this means when it comes to a compute slot in a cluster machine.

+3


source to share


4 answers


In map-reduce v.1 mapreduce.tasktracker.map.tasks.maximum and mapreduce.tasktracker.reduce.tasks.maximum are used to adjust the number of map slots and reduce slots respectively in mapred-site.xml.



since map-reduce v.2 (YARN), containers use a more general term instead of slots, containers represent the maximum number of tasks that can be executed in parallel under a node, regardless of the Map task, Reduce task or application task (in YARN).

+3


source


in general, it depends on the CPU and memory
in the cluster from a cluster, we have established 20 card slots and 15 cut slot machines 32Core, 64G-memory
1.Priblizitelno one slot requires one CPU core
2. The number of card slots should be slightly more than reduced



0


source


In MRV1, each machine had a fixed number of card slots and smaller slots. In general, each machine is configured with a 4: 1 card: gear ratio on the machine.

  • logically it would be possible to read a lot of data (Maps) and crunch them to a small set (Reduce).

Containers came into the MRV2 concept and any container can run either a map / reducer / shell script.

0


source


A little late, I will answer anyway.

Computing slot. Can you talk about all the different computations in Hadoop that will take some resource, i.e. memory / CPU / Disk Size.

Resource = Requires memory or processor core or disk size

Allocate a resource to run a Container, allocate a resource to execute a map or shrink task, etc.

It's all about how you want to manage the resources you have. Now what would it be? RAM, Core, Disks Size.

The goal is to ensure that your processing is not constrained by any one of these cluster resources. You want your processing to be as dynamic as possible.

As an example, Hadoop YARN allows you to configure the minimum RAM required to run a YARN container, minimum RAM requires a MAP / REDUCE task to run, the JVM heap size (for Map and Shrink tasks) and the amount of virtual memory each task will succeed.

Unlike Hadoop MR1, you do not pre-configure (as an example the size of the RAM) before even starting the "Reduce Map" tasks. In the sense that you want the resource allocation to be as elastic as possible, i.e. to dynamically increase the RAM / CPU cores for a MAP or REDUCE task.

0


source







All Articles