Multi-node Hadoop cluster. Node data is not working as expected
I am using hasoop as a multi node cluster (distributed mode). But each of the data node has different cluster IDs.
In slave1,
java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-2ecca585-6672-476e-9931-4cfef9946c3b
In slave2,
java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-e24b0548-2d8d-4aa4-9b8c-a336193c006e
I also went to this link Datanode won't start correctly , but I don't know which cluster ID I should choose. If I select any data then the node data starts on this machine but not on the other. And also when I format the namenode with the basic command (hasoop namenode - format) the datanodes are started on each slave but then the namenode on the master is not started.
source to share
The cluster IDs of datanodes and namenodes must match, then only datanodes can communicate effectively with the namenode. If you create a namenode format, a new cluster ID will be assigned to the namenodes, then the ClusterID in the datanodes will not match.
You can find the files VERSION
in the / home / pushuser1 / hadoop / tmp / dfs / data / current / directory (datanode directory) and also in the namenode directory (/ home / pushuser1 / hadoop / tmp / dfs / name / current / based on the value you specified for dfs.namenode.name.dir), which contains the ClusterID.
When you are ready to format ham-nadenode, stop all HDFS services, clear all files in the following directories
rm -rf /home/pushuser1/hadoop/tmp/dfs/data/* (Need to execute on all data nodes)
rm -rf /home/pushuser1/hadoop/tmp/dfs/name/*
and format hdfs ( hadoop namenode -format
) again
source to share