Why is std :: inner_product slower than naive implementation?

This is my naive implementation of the dot product:

float simple_dot(int N, float *A, float *B) {
    float dot = 0;
    for(int i = 0; i < N; ++i) {
    dot += A[i] * B[i];
    }

    return dot;
}

      

And this is using the C ++ library:

float library_dot(int N, float *A, float *B) {
    return std::inner_product(A, A+N, B, 0);
}

      

I've done some tests (code here https://github.com/ijklr/sse ) and the library version is much slower. My compiler flag-Ofast -march=native

+3


source to share


1 answer


Your two functions don't do the same. The algorithm uses an accumulator whose type is inferred from an initial value, which in your case ( 0

) is int

. Accumulating a float to an int doesn't just take longer than accumulating in a float, but it also gives a different result.

Use the seed value 0.0f

or equivalent as the code equivalent of your original loop float{}

.



(Note that std::accumulate

it is very similar in this regard.)

+7


source







All Articles