Seemingly conflicting results when profiling memory in Go
I recently ran some numeric codes written in the Start Large Datasets section and ran into memory management issues. When trying to profile the problem, I measured memory usage in my program in three ways: with the Go package runtime/pprof
with a unix utility, time
and by manually adding the size of the data I allocated. These three methods do not give me consistent results.
Below is a simplified version of the code I'm profiling. It selects multiple chunks, puts the values ββat each index, and puts them in the parent slice:
package main
import (
"fmt"
"os"
"runtime/pprof"
"unsafe"
"flag"
)
var mprof = flag.String("mprof", "", "write memory profile to this file")
func main() {
flag.Parse()
N := 1<<15
psSlice := make([][]int64, N)
_ = psSlice
size := 0
for i := 0; i < N; i++ {
ps := make([]int64, 1<<10)
for i := range ps { ps[i] = int64(i) }
psSlice[i] = ps
size += int(unsafe.Sizeof(ps[0])) * len(ps)
}
if *mprof != "" {
f, err := os.Create(*mprof)
if err != nil { panic(err) }
pprof.WriteHeapProfile(f)
f.Close()
}
fmt.Printf("total allocated: %d MB\n", size >> 20)
}
Doing this with the command $ time time -f "%M kB" ./mem_test -mprof=out.mprof
results in the output:
total allocated: 256 MB
1141216 kB
real 0m0.150s
user 0m0.031s
sys 0m0.113s
Here, the first number, 256MB, only represents the size of the arrays calculated from unsafe.Sizeof
, and the second number, 1055MB, is what it says time
. Running the pprof tool results in
(pprof) top1
Total: 108.2 MB
107.8 99.5% 99.5% 107.8 99.5% main.main
These results are scaled in the same way you would expect them to be shorter or longer.
Why aren't these three numbers lining up more closely?
First, you need to provide an error free example. Let's start with the base numbers. For example,
package main
import (
"fmt"
"runtime"
"unsafe"
)
func WriteMatrix(nm [][]int64) {
for n := range nm {
for m := range nm[n] {
nm[n][m]++
}
}
}
func NewMatrix(n, m int) [][]int64 {
a := make([]int64, n*m)
nm := make([][]int64, n)
lo, hi := 0, m
for i := range nm {
nm[i] = a[lo:hi:hi]
lo, hi = hi, hi+m
}
return nm
}
func MatrixSize(nm [][]int64) int64 {
size := int64(0)
for i := range nm {
size += int64(unsafe.Sizeof(nm[i]))
for j := range nm[i] {
size += int64(unsafe.Sizeof(nm[i][j]))
}
}
return size
}
var nm [][]int64
func main() {
n, m := 1<<15, 1<<10
var ms1, ms2 runtime.MemStats
runtime.ReadMemStats(&ms1)
nm = NewMatrix(n, m)
WriteMatrix(nm)
runtime.ReadMemStats(&ms2)
fmt.Println(runtime.GOARCH, runtime.GOOS)
fmt.Println("Actual: ", ms2.TotalAlloc-ms1.TotalAlloc)
fmt.Println("Estimate:", n*3*8+n*m*8)
fmt.Println("Total: ", ms2.TotalAlloc)
fmt.Println("Size: ", MatrixSize(nm))
// check top VIRT and RES for COMMAND peter
for {
WriteMatrix(nm)
}
}
Output:
$ go build peter.go && / usr / bin / time -f "% M KiB" ./peter amd64 linux Actual: 269221888 Estimate: 269221888 Total: 269240592 Size: 269221888 ^ C Command exited with non-zero status 2 265220 KiB $ $ top VIRT 284268 RES 265136 COMMAND peter
Is this what you expected?
See MatrixSize
for the correct way to calculate the memory size.
In an endless loop that allows us to use the command top
, bind the matrix as resident by updating it.
What results will you get when you run this program?
Mistake:
Your result from /usr/bin/time
is equal 1056992 KiB
, which is four times too large. This is a bug in your version /usr/bin/time
, ru_maxrss
reported in KBytes not pages. My Ubuntu version has been fixed.
Literature:
Re: GNU Time: Wrong Results
time-1.7 thinks rusage does not work on Linux
GNU Projects Archive: Time
"time" 1.7-24 is the original package in Ubuntu . ru_maxrss
reported in KBytes not pages. (Closes: # 649402)
# 649402 - [PATCH] time overestimates max RSS by 4 times - Debian bug report logs
Subject: Correcting ru_maxrss messages Author: Richard Kettlewell Bug-Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649402
--- time-1.7.orig/time.c +++ time-1.7/time.c @@ -392,7 +398,7 @@ ptok ((UL) resp->ru.ru_ixrss) / MSEC_TO_TICKS (v)); break; case 'M': /* Maximum resident set size. */ - fprintf (fp, "%lu", ptok ((UL) resp->ru.ru_maxrss)); + fprintf (fp, "%lu", (UL) resp->ru.ru_maxrss); break; case 'O': /* Outputs. */ fprintf (fp, "%ld", resp->ru.ru_oublock);