Why is an array with power greater than 2 slower? Why am I getting dynamic performance?

I have a simple array operation in a for loop that runs on different sizes (16 to very large) of an array that contains doubles. I do this several times:

for(int i = 1; i < n-1; i++){
    target[i] = (source[i-1]+source[i]+source[i+1])*0.5;
}

      

I compiled it with "-O3 -march = native" and measured the speed. Then, for unrelated reasons, I tried adding "-dynamic" to speed things up a lot, as you can see in the plot. "cmake" in the legend refers to "-dynamic" addition. This only works on i7-4790 processor. I was unable to reproduce it on AMD Phenom II X6 1045T at all.

I certainly do not understand why -rdynamic would produce a lot of acceleration. (GLOPS = # updates array cells per second in billions). Why am I picking up speed? Why not on an AMD processor?

Note that these measurements are the average of ten measurements for both cases.

And another interesting note is that, at least in the beginning, when the array fits into the L1 cache, I have these performance drops. Interestingly, this is happening since my array size is 2. I think it has something to do with L2 cache, but I have absolutely no idea what or why. Maybe some cache conflicts or alignment?

enter image description here

EDIT: I've now built correctly with only: g ++ -O3 -march = native program.cpp -rdynamic The curve denoted by "cmake" is the same as adding "-dynamic".

EDIT 2: Removed the smack story from the question in its entirety. [Peter]

+3


source to share


1 answer


I have no idea why rdynamic would lead to speedup. But regarding your second question, check out Agner Fog's guide "Optimizing C ++ Software" http://www.agner.org/optimize/optimizing_cpp.pdf . Take a look at section 9.2 where he talks about the critical step. May be applicable in this situation.



+1


source







All Articles