Namenode metadata store for all file blocks

After reading the book Hadoop: The Definitive Guide, I came across this page with the following line:

The pointer also knows the data on which all the blocks for the given file are located , but it does not persist in block locks, since this information is retrieved from the datanodes at system startup .

I am struggling to understand how this works. Let's say I copy a 1GB file in an 8 node cluster with a replication factor of 3. So each datanode will have 1 block and these blocks will be replicated to other nodes, resulting in the total number of blocks on each node being effectively up to 3. Now change should contain an index containing the location of each block. But according to the text, if the namenode does not persist in blocking locks, how are they restored after the cluster is closed and restarted. Unable to tell which block belongs to the file. Can someone please explain this to me?

+3


source to share


2 answers


The name saves some state about the files (name, path, size, block size, block IDs, etc.), not the physical location where the blocks are.

When data nodes start up, they effectively tree view by traversing the dfs data directory, finding all the file blocks they have and once they are finished, tell the name of the node the blocks it allocates.



Namenoad creates a file map to block locations from reports from each node information.

This is one of the reasons it sometimes takes a few minutes to get out of safe mode when the cluster is first started - if you have a lot of files, it may take a few steps for each node information and discover the blocks it is placing.

+2


source


Each fsimage file contains a serialized form of all directories and inodes files in the file system. Each inode is an internal representation of file or directory metadata and contains information such as file replication level, modification time and access time, access permissions, block size, and the blocks that make up the file. Modification times, permissions, and quota metadata are preserved for directories. The fsimage file does not write the data on which the blocks are stored. Instead, the namenode stores this mapping in memory, which it builds by setting datanodes to its blocklists when they join the cluster and periodically thereafter to ensure that the namenodes block mapping is updated.



-1


source







All Articles