Why is std :: inner_product slower than naive implementation?

Question

Why is std :: inner_product slower than naive implementation?

This is my naive implementation of the dot product:

float simple_dot(int N, float *A, float *B) {
    float dot = 0;
    for(int i = 0; i < N; ++i) {
    dot += A[i] * B[i];
    }

    return dot;
}

And this is using the C ++ library:

float library_dot(int N, float *A, float *B) {
    return std::inner_product(A, A+N, B, 0);
}

I've done some tests (code here https://github.com/ijklr/sse ) and the library version is much slower. My compiler flag-Ofast -march=native

+3

c ++ floating-point fast-math sse numeric

ijklr 28 Mar 17 at 20:38

source to share

1 answer

Kerrek SB · Accepted Answer · 2017-03-28T20:49:30+0000

Your two functions don't do the same. The algorithm uses an accumulator whose type is inferred from an initial value, which in your case ( 0

) is int

. Accumulating a float to an int doesn't just take longer than accumulating in a float, but it also gives a different result.

Use the seed value 0.0f

or equivalent as the code equivalent of your original loop float{}

.

(Note that std::accumulate

it is very similar in this regard.)

Why is std :: inner_product slower than naive implementation?

More articles: