Configuring Hadoop Cluster (Fully Distributed Mode)

I am installing hasoop on a multisite cluster and I have a few questions:

  • Is it good to have NameNode

    it ResourceManager

    on the same computer?

  • What is the best role for the host system NameNode

    , ResourceManager

    or DataNode/NodeManager

    ?.

  • I have a master and 3 slaves. The slave file on the master computer contains the following data:

    master
    slave1
    slave2
    slave3
    
          

Should I place this same slave file on all slave machines? Or should I delete the first line (master) and then put it in slave machines?

Regards.

+3


source to share


2 answers


  • Yes, at least in small clusters these two should work in the main node.
  • Check answer 1. Master node can also have for example SecondaryNamenode and JobHistoryServer
  • No, the slave file is only on the master node. If there is a master node in the slaves file, it means that the master node also acts as a datanode. Especially in small clusters that are totally fine. The slaves file essentially specifies which of the nodes the datanode processes start.

Slave nodes should only run DataNode and NodeManager. But all this is handled by Hadoop if the configurations are correct - you can just check which processes are started after starting the cluster from the node master. The master node basically takes care of everything, and you "never" need to manually connect to slaves for any configuration.



My answer is for small clusters, perhaps in large "real" clusters the server responsibilities are even more separate.

+3


source


To fully understand the concept of multi-node cluster follow this link - http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/

and to implement multi-node cluster step by step follow this link - http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/



Let these links help you

+1


source







All Articles