Understanding the output of the GHC + RTS -t -RTS option
I am comparing the memory consumption of a haskell program compiled with GHC. To do this, I run the program with the following command line arguments:
+RTS -t -RTS
. Here is an example output:
<<ghc: 86319295256 bytes, 160722 GCs, 53963869/75978648 avg/max bytes residency (386 samples), 191M in use, 0.00 INIT (0.00 elapsed), 152.69 MUT (152.62 elapsed), 58.85 GC (58.82 elapsed) :ghc>>
. According to ghc manual, the output shows:
- The total number of bytes allocated by the program during the entire run.
- The total number of garbage collections performed.
- Average and maximum residency, which is the amount of real-time data in bytes. Runtime can determine the amount of live data during main GC, so the number of samples matches the number of main GCs (and is usually relatively small).
- Peak memory allocated by RTS from the OS.
- The amount of CPU time and elapsed time on the wall when initializing the runtime system (INIT), starting the program itself (MUT, mutator) and garbage collection (GC).
Applied to my example, this means that my program shuffles 82321 MiB (bytes divided by 1024 ^ 2), performs a garbage collection of 160722, has an average / maximum memory of 51MiB / 72MiB, allocates no more than 191M of memory in RAM, and so on Further...
Now I want to know what is the "Average and maximum" residence "which is the amount of real time data in bytes" is compared to "Peak memory allocated by RTS from the OS"? And also: what is using the remaining space of approximately 120M?
I was listed here for more information, but it doesn't clearly indicate what I want to know. Another source (5.4.4 second item) indicates that 120M memory is used for garbage collection. But this is too vague - I need an information source.
So please, is there anyone who could answer my questions with good sources as evidence?
source to share
The resident size is the amount of live Haskell data you have. The amount of memory actually allocated from the OS may be higher.
RTS allocates memory in "blocks". If your program requires 7.3 blocks of RAM, the RTS should allocate 8 blocks, of which 0.7 is empty.
The default garbage collection algorithm is a 2-space collector. That is, when space A fills up, it allocates space B (which is completely empty) and copies all live data from space A to space B, then frees space A. This means that for a while, you are using as much of 2x RAM as possible. (I believe there is a switch somewhere that uses a 1-space algorithm, which is slower but uses less RAM.)
There is also some overhead for flow control (especially if you have batches) and there may be a few other things.
I don't know how much you already know about GC technology, but you can try reading them:
source to share