Why is std :: inner_product slower than naive implementation?
This is my naive implementation of the dot product:
float simple_dot(int N, float *A, float *B) {
float dot = 0;
for(int i = 0; i < N; ++i) {
dot += A[i] * B[i];
}
return dot;
}
And this is using the C ++ library:
float library_dot(int N, float *A, float *B) {
return std::inner_product(A, A+N, B, 0);
}
I've done some tests (code here https://github.com/ijklr/sse ) and the library version is much slower. My compiler flag-Ofast -march=native
source to share
Your two functions don't do the same. The algorithm uses an accumulator whose type is inferred from an initial value, which in your case ( 0
) is int
. Accumulating a float to an int doesn't just take longer than accumulating in a float, but it also gives a different result.
Use the seed value 0.0f
or equivalent as the code equivalent of your original loop float{}
.
(Note that std::accumulate
it is very similar in this regard.)
source to share