C ++ profiling: number of sync cycles
I am using valgrind --tool=callgrind
to profile an important part of my C ++ program.
The part itself takes less than a microsecond to execute, so I am profiling over a lot of loops over that part.
I noticed that instructions take a multiple of 0.13% of the time to complete (as a percentage of the total program execution time). So I only see 0.13, 0.26, 0.52, and so on.
My question is, should we assume that this atomic quantity is measuring the CPU cycle? See photo. (The output callgrind
is represented graphically with kcachegrind
.)
Edit: By the way, looking at the machine code, I see that it mov
takes 0.13, so it's probably the clock cycle really.
source to share
Callgrind does not measure CPU time. It measures the reading of instructions. That's where the term "Ir" comes from. If the fold is 0.13% (especially since you validated with mov), that means they measure one instruction read. There are also cache modeling options that allow you to estimate how likely you are to have cache misses.
Please note that not all instructions will be executed the same way, so the percentages do not correspond to the time taken for each section. However, it still gives you an idea of where your program is doing most of the work and is likely spending more time.
source to share