Configuring Hadoop Cluster (Fully Distributed Mode)
I am installing hasoop on a multisite cluster and I have a few questions:
-
Is it good to have
NameNode
itResourceManager
on the same computer? -
What is the best role for the host system
NameNode
,ResourceManager
orDataNode/NodeManager
?. -
I have a master and 3 slaves. The slave file on the master computer contains the following data:
master slave1 slave2 slave3
Should I place this same slave file on all slave machines? Or should I delete the first line (master) and then put it in slave machines?
Regards.
source to share
- Yes, at least in small clusters these two should work in the main node.
- Check answer 1. Master node can also have for example SecondaryNamenode and JobHistoryServer
- No, the slave file is only on the master node. If there is a master node in the slaves file, it means that the master node also acts as a datanode. Especially in small clusters that are totally fine. The slaves file essentially specifies which of the nodes the datanode processes start.
Slave nodes should only run DataNode and NodeManager. But all this is handled by Hadoop if the configurations are correct - you can just check which processes are started after starting the cluster from the node master. The master node basically takes care of everything, and you "never" need to manually connect to slaves for any configuration.
My answer is for small clusters, perhaps in large "real" clusters the server responsibilities are even more separate.
source to share
To fully understand the concept of multi-node cluster follow this link - http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/
and to implement multi-node cluster step by step follow this link - http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Let these links help you
source to share