How smart is MMAP?

mmap

can be used to exchange read-only memory between processes, reducing the printing of a stack of memory:

  • process P1 mmap

    file, uses mapped memory -> data is loaded into RAM
  • process P2 mmap

    file, uses mapped memory -> OS reuses same memory

But how about this:

  • process file P1 mmap

    , load it into memory, and then exit.
  • another P2 process mmap

    the same file accesses memory that is still hot from P1 access.

Is the data being loaded from the disk again? Is the OS smart enough to reuse virtual memory even if the "mmap count" temporarily dropped to zero?

Is the behavior different from different OS? (I'm most interested in Linux / OS X)

EDIT: In case the OS is not smart enough - would it help if there is one "background process", keeping the mmap

ed file so that it never leaves the address space of at least one process?

I, of course, interested in the performance, when I mmap

and munmap

the same file consistently and rapidly, possibly (but not necessarily) in the same process.

EDIT2: I see answers describing completely irrelevant points over a long length. To repeat this question: can I rely on Linux / OS X to not overload data that is already in memory from previous paged pages in mmap

ed segments of memory, even though a particular region is no longer mmap

ed by any process?

+3


source to share


2 answers


The presence or absence of the contents of a file in memory is much less related to system calls mmap

than you might think. When you do a mmap

file, it doesn't necessarily load it into memory. When you do munmap

it (or if the process exits), it doesn't necessarily drop pages.

There are many different things that can cause the contents of a file to be loaded into memory: displaying it, reading it, executing it normally, trying to access the memory that is associated with the file. Likewise, there are various things that can cause the contents of a file to be deleted from memory, mostly OS related, deciding that it wants memory for something more important.

In the two scenarios of your question, think about inserting a step between steps 1 and 2:

  • 1.5. another process is allocating and using a large amount of memory -> the mmap

    ed file is evicted from memory to free up space.

In this case, the contents of the file may have to be reloaded into memory if they are re-displayed and reused in step 2.



against

  • 1.5. nothing happens -> content of mmap

    ed file hangs in memory.

In this case, the contents of the file do not need to be reloaded in step 2.

In terms of what happens to the contents of your file, your two scripts are not very different from each other. It is something like this step 1.5 that would make a much more important difference.

As for a background process that constantly accesses the file to ensure that it remains in memory (for example, by scanning the file and then sleeping for a short amount of time in a loop), this would of course force the file to remain in memory. but you're probably better off just letting the OS make its own decision about when to check out a file and when not to check out.

+7


source


The second process will probably find data from the first process in the buffer buffer . Therefore, in most cases, the data will not be loaded again from disk. But since the buffer cache is a cache, there is no guarantee that pages will not be punctured in between.

You can start a third process and use mmap (2) and mlock (2) to commit the pages to ram. But this will probably cause more problems than it is worth.



Linux replaced UNIX buffer buffer with page cache . But the principle is still the same. The Mac OS X equivalent is called Unified Buffer Cache (UBC) .

+3


source







All Articles