Punching on AMD 15h

According to BKDG AMD 15h (p. 588) it is possible to disable hardware preselection by setting some MSRC001_1022 bits

MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits    -->  Description
63:16   -->  Reserved.
15      -->  DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14      -->  Reserved.
13      -->  DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher. 
12:10   -->  Reserved.
9:5     -->  Reserved.
4       -->  DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads. 
3:0     -->  Reserved.

      

To disable all prefetch configurations, I have to write 0xA008 for that MSR. I did it for all 32 cores with

[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...

      

However, when I run perf along with the command, the prefetch stats are nonzero!

[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
 Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
    55,341,597,193 L1-dcache-loads:uk
     1,047,662,614 L1-dcache-prefetches:uk
                 0 L1-dcache-prefetch-misses:uk
      35.921618464 seconds time elapsed

      

I expect to see 0 before L1-dcache-prefetches. Is not it?

How can I debug the counters to see how they are displayed in MSR?

+3


source to share


1 answer


The synthetic performance name mapping for the hw counters (specified perf list

) is defined within the subsystem kernel sources perf_events

for many CPUs. For amd, they are in the arch/x86/events/amd/core.c

file. In version 4.8 kernel events and amd cpu caching is mapped to processor specific constants to be written to PMR MSR as:

http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c



static __initconst const u64 amd_hw_cache_event_ids
 ... =  {
 [ C(L1D) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
        [ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts  */
        [ C(RESULT_MISS)   ] = 0x0167, /* Data Prefetcher :cancelled */
    },
 },
 [ C(L1I ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches  */
        [ C(RESULT_MISS)   ] = 0x0081, /* Instruction cache misses   */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = -1,
        [ C(RESULT_MISS)   ] = -1,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
        [ C(RESULT_MISS)   ] = 0,
    },
 },
 [ C(LL  ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
        [ C(RESULT_MISS)   ] = 0x037E, /* L2 Cache Misses : IC+DC     */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback           */
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
 },

...
__init int amd_pmu_init(void)
{ ...
    /* Performance-monitoring supported from K7 and later: */
    if (boot_cpu_data.x86 < 6)
        return -ENODEV;

    x86_pmu = amd_pmu;

    ret = amd_core_pmu_init();
    ...

    /* Events are common for all AMDs */
    memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
           sizeof(hw_cache_event_ids));
    return 0;
}

      

0


source







All Articles