Valgrind and time give opposite results

I have some (Fortran) code that accumulates data into an array, basically doing this:

complex,dimension(4000)::a,b
complex :: c
[...]
a=0.
do i=1,20000
    b=foo(...)
    c=bar(...)
    a=a+b*c
end do

      

Using callgrind I find out that most of my program effort is to execute the line

a=a+b*c

      

so I'm wondering if I can do anything to speed this up. As a starting point, I tried to use BLAS libraries optimized for my system and replacing this line with

call caxpy(4000,c,b,1,a,1)

      

In Callgrind's reports, this reduces the "Ir" count for the entire program by about 40%. However, the execution time, as measured by "time", increases by about 20%.

I expected the execution time to be roughly proportional to the number of instructions being executed, and therefore the two measures should give comparable results (time reports 99% CPU usage). What am I missing here?

+3


source to share





All Articles