How does read (2) work in Linux C?

Question

How does read (2) work in Linux C?

According to the man page, we can specify the number of bytes we want to read from the file descriptor.

But in a read implementation, how many read requests will be made to perform the read?

For example, if I want to read 4MB, will it only create one request for 4MB, or split it into multiple small requests? e.g. 4KB per request?

+3

c linux system

Michael tong May 13 '15 at 3:59

source to share

5 answers

If there is data, the read will return as much data as is available and will fit into the buffer without waiting. If there is no data available, it will wait until something comes along and return what it can without waiting any longer.

How much it depends on what the file descriptor refers to. If it is for a socket, it will be whatever is in the socket buffer. If it is a file, it will be whatever is in the buffer cache.

+1

Chris dodd May 13 '15 at 4:08

source to share

When you call read

, it only makes one request to fill the size of the buffer, and if it cannot fill the entire buffer (no more data or data arrives like in sockets), it returns the number of bytes it actually wrote in your buffer.

As the manual says:

RETURN VALUE

Upon successful completion, these functions return a non-negative integer indicating the number of bytes actually read. Otherwise, functions must return -1 and set errno to indicate an error.

+1

mdh.heydari May 13 '15 at 4:09 am

source to share

It depends on how deep you go.

The C library just passes the size you gave it straight to the kernel in one system call read()

, so at this level it's just one request.

Internally, for a regular file in standard buffered mode, the 4MB you requested will be copied from multiple pagecache pages (at 4kB) that are unlikely to be contiguous. Any file that is not actually in the pagecache must be read from disk. The file cannot be saved to the disk with the disk, so 4 MB can lead to multiple requests to the underlying block device.

+1

caf May 13 '15 @ 4:12 am

source to share

In fact, there is not a single correct answer, except as much as necessary, whichever layer occurs upon request. Typically, one box will be transferred to the core. This can lead to the fact that further requests will not come to other levels, because all the information is in memory. But if data is to be read from, say, software RAID, queries may be required for multiple physical devices to satisfy the query.

I don't think you can really give a better answer than "what the developer thought was the best way".

+1

David Schwartz May 13 '15 @ 4:12 am

source to share

MichaelMoser · Accepted Answer · 2015-05-13T04:51:07+0000

read (2) is a system call, so it calls the vDSO shared library to send a system call (in the very old days it was an interrupt, but now there are faster ways to send system calls).
inside the kernel, the call is first handled by the vfs virtual file system; The virtual file system provides a common interface for inodes (structures that represent open files) and a common way to interact with the underlying file system.
vfs sends to the underlying filesystem (mount (8) will tell you which mount point exists and which filesystem is in use there). (see here for more information http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture16.pdf )
the file system can do its own caching, so the number of disk reads depends on what is in the cache, and how the file system allocates blocks to store a specific file and how the file is divided into disk blocks - all questions to a specific file system)
If you want to do your own caching, open the file with the O_DIRECT flag; in this case, there is an attempt not to use the cache; however, all reads must be 512 offset aligned and have a multiple of 512 (this is required to transfer your buffer via DMA to the backup store http://www.quora.com/Why-does-O_DIRECT-require-IO-to-be- 512-byte-aligned )

How does read (2) work in Linux C?

More articles: