Correct way to check size of std :: vector inside loop
I have a std :: vector that I need to loop frequently. I see two ways to do this
First way:
const size_t SIZE = myVec.size();
for (size_t i = 0; i < SIZE; i++)
{
myVec[i] = 0;
}
Second way:
for (size_t i = 0; i < myVec.size(); i++)
{
myVec[i] = 0;
}
Is the first more efficient than the second, or do modern compilers know to optimize the second implementation to make it as efficient as the first?
FWIW, I'm on Visual Studio 2013.
source to share
The first version will often be faster even with modern compilers. It is difficult for the optimizer to prove that the size does not change due to anti-aliasing with the location written in the body of the loop, and therefore in many cases the second version will have to recalculate the size at each iteration of the loop.
I measured this in Visual Studio 2013 release and found performance difference for both 32 and 64 bit code. Both versions are easily beaten by std :: fill (). These measurements are averages of over 1000 runs with 10 million element vectors (increasing the cardinality to a billion reduces the performance difference somewhat as memory access becomes more of a bottleneck).
Method Time relative to uncached for loop
x86 x64
uncached for loop 1.00 1.00
cached for loop 0.70 0.98
std::fill() 0.42 0.57
Baseline cache size for loop code:
const auto size = vec.size();
for (vector<int>::size_type i = 0; i < size; ++i) {
vec[i] = val;
}
Will compile into this loop body (x86 release):
00B612C0 mov ecx,dword ptr [esi]
00B612C2 mov dword ptr [ecx+eax*4],edi
00B612C5 inc eax
00B612C6 cmp eax,edx
00B612C8 jb forCachedSize+20h (0B612C0h)
Whereas a version that doesn't cache the size of the vector:
for (vector<int>::size_type i = 0; i < vec.size(); ++i) {
vec[i] = val;
}
Compiles for this, which revises vec.size () every time through the loop:
00B612F0 mov dword ptr [edx+eax*4],edi
00B612F3 inc eax
00B612F4 mov ecx,dword ptr [esi+4] <-- Load vec.end()
00B612F7 mov edx,dword ptr [esi] <-- Load vec.begin()
00B612F9 sub ecx,edx <-- ecx = vec.end() - vec.begin()
00B612FB sar ecx,2 <-- exc = (vec.end() - vec.begin()) / sizeof(int)
00B612FE cmp eax,ecx
00B61300 jb forComputedSize+20h (0B612F0h)
source to share
I prefer to write my loops as in the first case. In the second case, and std::vector::size()
you can pay for a few additional workloads in the compiler-optimized version, but when you start working with more complex data structures, these simple workloads can become expensive tree searches.
Even with preference, the context sometimes requires you to write your loop in the second form. In the first case, hints that a mutation in the container size does not occur, since the container size is checked once. When you read the second case, the container size is checked every iteration, which tells the user that the body may be mutating the container size.
If you are mutating a container in your loop body, use the second form and comment that you are mutating your container and want to check its size. Otherwise, choose the first one.
source to share
Getting the size of a vector is always constant.
If your algorithm might be less efficient, you use myVec[i]
for each index. It is going to retrieve the pointer and add "i" to it each time. Pointer arithmetic will probably outperform it for performance, and if you are using vector
iterator
it as it will most likely be implemented as a pointer it will probably outperform your loop.
If you are setting all values to 0
, you can probably trump even that with a single function call, not a loop, in this case
myVec.assign( myVec.size(), 0 );
source to share