How are HDFS files stored on the underlying OS file system?

HDFS is a logical file system in Hadoop with a block size of 64 MB. The HDFS file, in turn, is stored on the underlying OS filesystem, say ext4, with a 4KiB size as the block size.

As far as I know, for a file on the local file system, the OS uses the start and end cylinders of the physical hard disk of the 4KiB block to eject it. Since HDFS files are also saved on the ext4 file system, HDFS files must also be retrieved using 4KiB start and end blocks.

If so, it will not increase the speed of data retrieval. Now the question is, what is the technology used in HDFS hard drive to increase the search speed?

Thank you in advance

+3


source to share


1 answer


The search speed from the ext filesystem does not change, since you think about it very correctly. But it happens that a large file is split into 64 MB chunks that are stored on different machines. Therefore, when a search request is made, multiple machines simultaneously scan the fragments of the file and are sent to the host computer (node ​​name). Thus, everything is accelerating. This is the same as 10 men who complete the building task in 1 day, not 1 man in 10 days.



+1


source







All Articles