Hadoop: Datanodes available: 0 (total 0, 0 dead)
Every time I run:
hadoop dfsadmin -report
I am getting the following output:
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)
- There is no data directory in my dfs / folder.
- A lock file exists in this folder: in_use.lock
- The wizard, job tracker, and data nodes are working fine.
source to share
I had exactly the same problem and when I checked the datanodes data logs there were a lot could not connect to master:9000
and when I checked the ports on master via netstat -ntlp
I got this in the output:
tcp 0 0 127.0.1.1:9000 ...
I realized that I need to change the hostname or change master
in all configurations. I decided to do the first reason, it seems much easier. so I changed /etc/hosts
and changed 127.0.1.1 master
to 127.0.1.1 master-machine
and added an entry at the end of the file like this:
192.168.1.1 master
Then I changed master
to master-machine
in /etc/hostname
and restarted the machine. The problem disappeared.
source to share
Usually datanode gets namespace errors. So remove the dir name from master and remove the dir data from datanodes. Now format the datanode and try start-dfs. It usually takes some time for the report to reflect all the data. Even I was getting 0 datanodes, but after a while the master discovers slaves.
source to share
Just solve the problem by following these steps -
- Make sure the IP addresses for master and slave are correct in the file
/etc/hosts
- If you really don't need the data,
stop-dfs.sh
delete all directoriesdata
in master / slave nodes, then runhdfs namenode -format
andstart-dfs.sh
. This should recreate hdfs and fix the problem
source to share
Just formatting the namenod didn't work for me. So I checked the logs in $HADOOP_HOME/logs
. In secondarynamenode
I encountered this error:
ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.
LV = -64 namespaceID = 2095041698 cTime = 1552034190786 ; clusterId = CID-db399b3f-0a68-47bf-b798-74ed4f5be097 ; blockpoolId = BP-31586866-127.0.1.1-1552034190786.
Expecting respectively: -64; 711453560; 1550608888831; CID-db399b3f-0a68-47bf-b798-74ed4f5be097; BP-2041548842-127.0.1.1-1550608888831.
at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:143)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:550)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:360)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:325)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:321)
at java.lang.Thread.run(Thread.java:748)
So I stopped hadoop and then specifically formatted the specified cluster id:
hdfs namenode -format -clusterId CID-db399b3f-0a68-47bf-b798-74ed4f5be097
This fixed the problem.
source to share
There is another unclear reason why this can happen: your datodode did not start as expected, but everything else worked.
In my case, when looking at the log, I found that the associated port 510010 was already in use by SideSync (for macOS). I found this through
sudo lsof -iTCP -n -P|grep 0010
, but you can use similar methods to determine what has already taken your well known data node port.
Disconnecting and restarting fixed the issue.
Also, if you installed Hadoop / Yarn as root, but you have data directories in separate home directories, and then try to run it as a separate user, you will need to make the data node directory public.
source to share