Punching on AMD 15h
According to BKDG AMD 15h (p. 588) it is possible to disable hardware preselection by setting some MSRC001_1022 bits
MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits --> Description
63:16 --> Reserved.
15 --> DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14 --> Reserved.
13 --> DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher.
12:10 --> Reserved.
9:5 --> Reserved.
4 --> DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads.
3:0 --> Reserved.
To disable all prefetch configurations, I have to write 0xA008 for that MSR. I did it for all 32 cores with
[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...
However, when I run perf along with the command, the prefetch stats are nonzero!
[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
55,341,597,193 L1-dcache-loads:uk
1,047,662,614 L1-dcache-prefetches:uk
0 L1-dcache-prefetch-misses:uk
35.921618464 seconds time elapsed
I expect to see 0 before L1-dcache-prefetches. Is not it?
How can I debug the counters to see how they are displayed in MSR?
source to share
The synthetic performance name mapping for the hw counters (specified perf list
) is defined within the subsystem kernel sources perf_events
for many CPUs. For amd, they are in the arch/x86/events/amd/core.c
file. In version 4.8 kernel events and amd cpu caching is mapped to processor specific constants to be written to PMR MSR as:
http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c
static __initconst const u64 amd_hw_cache_event_ids
... = {
[ C(L1D) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses */
[ C(RESULT_MISS) ] = 0x0141, /* Data Cache Misses */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts */
[ C(RESULT_MISS) ] = 0x0167, /* Data Prefetcher :cancelled */
},
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches */
[ C(RESULT_MISS) ] = 0x0081, /* Instruction cache misses */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
[ C(RESULT_MISS) ] = 0,
},
},
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
[ C(RESULT_MISS) ] = 0x037E, /* L2 Cache Misses : IC+DC */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback */
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
},
...
__init int amd_pmu_init(void)
{ ...
/* Performance-monitoring supported from K7 and later: */
if (boot_cpu_data.x86 < 6)
return -ENODEV;
x86_pmu = amd_pmu;
ret = amd_core_pmu_init();
...
/* Events are common for all AMDs */
memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
return 0;
}
source to share