Can Linux Perf Profiler be used inside C ++ code?

I would like to measure the L1, L2 and L3 cache / miss ratio of some parts of my C ++ code. I'm not interested in using Perf for my entire application. Can Perf be used as a library inside C ++?

int main() {
    ...
    ...
    start_profiling()
    // The part I'm interested in
    ...
    end_profiling()
    ...
    ...
}

      

I gave Intel PCM a shot, but I had two questions. First, he gave me some weird numbers . Second, it does not support L1 cache profiling.

If this isn't possible with Perf, what's the easiest way to get this information?

+3


source to share


2 answers


It looks like all you're trying to do is read a few performances, which is perfect for a PAPI library .

Example.



a full list of supported counters is quite long, but it seems that you are most interested in PAPI_L1_TCM

, PAPI_L1_TCA

and their analogues L2

and L3

. Note that you can also split read / write accesses and you can distinguish between command and data caches.

+2


source


Yes, there is dedicated stream monitoring that allows perforated counters to be read from user space. See the man page forperf_event_open(2)

Since it perf

only supports L1i, L1d and last-level cache events, you will need to use the mode PERF_EVENT_RAW

and numbers from the manual for your processor.

To implement profiling, you need to configure sample_interval

, poll

/ select

fd is or wait for the signal SIGIO

, and when this happens, read the sample and the instruction pointer from it. You can try to resolve returned command pointers to function names using a debugger such as GDB.




Another option is to use SystemTap . You will need an empty implementation start|end_profiling()

to enable SystemTap profiling with something like this:

global traceme, prof;

probe process("/path/to/your/executable").function("start_profiling") {
    traceme = 1;
}

probe process("/path/to/your/executable").function("end_profiling") {
    traceme = 0;
}

probe perf.type(4).config(/* RAW value of perf event */).sample(10000) {
    prof[usymname(uaddr())] <<< 1;
}

probe end {
    foreach([sym+] in prof) {
        printf("%16s %d\n", sym, @count(prof[sym]));
    }
}

      

+1


source







All Articles