Why doesn't Hadoop use other hard drives?

This is my first time playing with a Hadoop cluster, so I am very new to this.

I have a small cluster of 3 nodes with 5 x 2 TB hard drives on each machine. All are running Ubuntu, have the same hardware specifications and use Apache Hadoop 1.0.4. Hard drives are mounted as / media / diskb, / media / diskc, / media / diskd, etc. On each respective computer and configured as JBOD.

All 3 computers serve as Data Nodes and Task Controllers, and one is the master Node name and Secondary Node name, the second is Track Track Job, and the third is a blank bill (DT / TT) node.

On each machine hdfs-site.xml , I have comma separated mount points with no spaces as values.

<property>
 <name>dfs.datanode.data.dir</name>
 <value>/data/dfs/data,/media/diskb/data/dfs/data,/media/diskc/data/dfs/data,..</value>
</property>

      

For the Node name:

<property>
 <name>dfs.namenode.name.dir</name>
 <value>/data/dfs/name,/media/diskb/data/dfs/name,/media/diskc/data/dfs/name,..</value>
</property>

      

In mapred-site.xml:

<property>
 <name>mapred.local.dir</name>
 <value>/data/mapred/local,/media/diskb/data/mapred/local,/media/diskc/data/mapred/local,...</value>
</property>

      

Also, in the core-site.xml file

<property>
 <name>hadoop.tmp.dir</name>
 <value>/media/diskb/data</value>
</property>

      

(I was playing around with changing temp directory to be mapped to disk at the same time to check permissions etc. and Hadoop works fine)

Mount permissions and directory ownership are complete for the Hadoop user account. When I run the map / reduce program I see that Hadoop creates resource folders inside additional drives on each Node in its mapred / local directories , but I don't see the same for Node data directories and configured capacity reported in the admin page (namenode: 50070), is: 5.36 TB (1.78 TB for each node).

Why doesn't Hadoop use every hard drive, which is supposed to have a total capacity of 26.7 TB?

Also I don't see any performance gain when doing a Map / Reduce job using all disks and just using one disk per node. What should I expect?

Thank!

+3


source to share


1 answer


Ok, really simple answer: dfs.namenode.name.dir should be dfs.name.dir and dfs.datanode.data.dir should be dfs.data.dir



I thought they (dfs.name.dir, dfs.data.dir) were deprecated, but apparently not. So, Hadoop was built by default, set in the core-site.xml file, so only 3 disks are used.

+2


source







All Articles