Valgrind and time give opposite results
I have some (Fortran) code that accumulates data into an array, basically doing this:
complex,dimension(4000)::a,b
complex :: c
[...]
a=0.
do i=1,20000
b=foo(...)
c=bar(...)
a=a+b*c
end do
Using callgrind I find out that most of my program effort is to execute the line
a=a+b*c
so I'm wondering if I can do anything to speed this up. As a starting point, I tried to use BLAS libraries optimized for my system and replacing this line with
call caxpy(4000,c,b,1,a,1)
In Callgrind's reports, this reduces the "Ir" count for the entire program by about 40%. However, the execution time, as measured by "time", increases by about 20%.
I expected the execution time to be roughly proportional to the number of instructions being executed, and therefore the two measures should give comparable results (time reports 99% CPU usage). What am I missing here?
source to share
No one has answered this question yet
Check out similar questions: