How to run MapReduce tasks in Parallel to hadoop 2.x?

I would like my map and reduce tasks to work in parallel. However, despite trying every trick in the bag, they still work consistently. I have read from How to set the exact maximum number of concurrent tasks per node in Hadoop 2.4.0 on Elastic MapReduce , using the following formula can set the number of tasks to be executed in parallel.

min (yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb, 
 yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores)

      

However, I did it, as you can see from the yarn-site.xml and the mapred-site.xml which I am using below. But the tasks are still done sequentially. Please note that I am using open source Apache Hadoop and not Cloudera. Would you move to Cloudera to solve the problem? Also note that my input files are big enough that dfs.block.size wouldn't be a problem either.

yarn site.xml

    <configuration>
    <property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>131072</value>
    </property>
    <property>
      <name>yarn.nodemanager.resource.cpu-vcores</name>
      <value>64</value>
    </property>
    </configuration>

      

mapred-site.xml

    <configuration>
    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
    </property>

    <property>
      <name>mapreduce.map.memory.mb</name>
      <value>16384</value>
    </property>

    <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>16384</value>
    </property>

    <property>
        <name>mapreduce.map.cpu.vcores</name>
        <value>8</value>
    </property>

    <property>
        <name>mapreduce.reduce.cpu.vcores</name>
        <value>8</value>
    </property>
    </configuration>

      

+3


source to share


1 answer


A container is a logical execution pattern reserved for performing Map / Reduce tasks at every node in a goblet.

The property yarn.nodemanager.resource.memory-mb

tells the YARN resource manager to reserve most of the RAM for all containers that will be sent to a node to perform Map / Reduce tasks. This is the maximum high memory bound to be reserved for each container.

But in your case, the free memory in node is almost 11 GB, and you configured for yarn.nodemanager.resource.memory-mb

almost 128 GB (131072), mapreduce.map.memory.mb

and mapreduce.reduce.memory.mb

16 GB. The required upper bound size for Map / Reduce containers is 16 GB, which is over 11 GB of free memory *. This could be the reason that you have only been allocated one container per node for execution.



We will reduce the value of the properties mapreduce.map.memory.mb

, mapreduce.reduce.memory.mb

than the value of free memory to get more than one container, working in parallel.

Also see some ways to increase free memory as more than 90% of it is already in use.

Hope this helps :) ..

+4


source







All Articles