C ++ profiling: number of sync cycles

I am using valgrind --tool=callgrind

to profile an important part of my C ++ program.

The part itself takes less than a microsecond to execute, so I am profiling over a lot of loops over that part.

I noticed that instructions take a multiple of 0.13% of the time to complete (as a percentage of the total program execution time). So I only see 0.13, 0.26, 0.52, and so on.

My question is, should we assume that this atomic quantity is measuring the CPU cycle? See photo. (The output callgrind

is represented graphically with kcachegrind

.)

enter image description here

Edit: By the way, looking at the machine code, I see that it mov

takes 0.13, so it's probably the clock cycle really.

+3


source to share


1 answer


Callgrind does not measure CPU time. It measures the reading of instructions. That's where the term "Ir" comes from. If the fold is 0.13% (especially since you validated with mov), that means they measure one instruction read. There are also cache modeling options that allow you to estimate how likely you are to have cache misses.



Please note that not all instructions will be executed the same way, so the percentages do not correspond to the time taken for each section. However, it still gives you an idea of ​​where your program is doing most of the work and is likely spending more time.

+1


source







All Articles