Multi-node Hadoop cluster. Node data is not working as expected

I am using hasoop as a multi node cluster (distributed mode). But each of the data node has different cluster IDs.

In slave1,

java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-2ecca585-6672-476e-9931-4cfef9946c3b

      

In slave2,

java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-e24b0548-2d8d-4aa4-9b8c-a336193c006e

      

I also went to this link Datanode won't start correctly , but I don't know which cluster ID I should choose. If I select any data then the node data starts on this machine but not on the other. And also when I format the namenode with the basic command (hasoop namenode - format) the datanodes are started on each slave but then the namenode on the master is not started.

+3


source to share


1 answer


The cluster IDs of datanodes and namenodes must match, then only datanodes can communicate effectively with the namenode. If you create a namenode format, a new cluster ID will be assigned to the namenodes, then the ClusterID in the datanodes will not match.

You can find the files VERSION

in the / home / pushuser1 / hadoop / tmp / dfs / data / current / directory (datanode directory) and also in the namenode directory (/ home / pushuser1 / hadoop / tmp / dfs / name / current / based on the value you specified for dfs.namenode.name.dir), which contains the ClusterID.

When you are ready to format ham-nadenode, stop all HDFS services, clear all files in the following directories



rm -rf /home/pushuser1/hadoop/tmp/dfs/data/*  (Need to execute on all data nodes)
rm -rf /home/pushuser1/hadoop/tmp/dfs/name/*

      

and format hdfs ( hadoop namenode -format

) again

+9


source







All Articles