Correct way to check size of std :: vector inside loop

I have a std :: vector that I need to loop frequently. I see two ways to do this

First way:

const size_t SIZE = myVec.size();
for (size_t i = 0; i < SIZE; i++)
{
    myVec[i] = 0;
}

      

Second way:

for (size_t i = 0; i < myVec.size(); i++)
{
    myVec[i] = 0;
}

      

Is the first more efficient than the second, or do modern compilers know to optimize the second implementation to make it as efficient as the first?

FWIW, I'm on Visual Studio 2013.

+3


source to share


5 answers


The first version will often be faster even with modern compilers. It is difficult for the optimizer to prove that the size does not change due to anti-aliasing with the location written in the body of the loop, and therefore in many cases the second version will have to recalculate the size at each iteration of the loop.

I measured this in Visual Studio 2013 release and found performance difference for both 32 and 64 bit code. Both versions are easily beaten by std :: fill (). These measurements are averages of over 1000 runs with 10 million element vectors (increasing the cardinality to a billion reduces the performance difference somewhat as memory access becomes more of a bottleneck).

Method                   Time relative to uncached for loop
                         x86      x64

uncached for loop        1.00     1.00
cached for loop          0.70     0.98
std::fill()              0.42     0.57

      

Baseline cache size for loop code:

const auto size = vec.size();
for (vector<int>::size_type i = 0; i < size; ++i) {
    vec[i] = val;
}

      

Will compile into this loop body (x86 release):



00B612C0  mov         ecx,dword ptr [esi]  
00B612C2  mov         dword ptr [ecx+eax*4],edi  
00B612C5  inc         eax  
00B612C6  cmp         eax,edx  
00B612C8  jb          forCachedSize+20h (0B612C0h)  

      

Whereas a version that doesn't cache the size of the vector:

for (vector<int>::size_type i = 0; i < vec.size(); ++i) {
    vec[i] = val;
}

      

Compiles for this, which revises vec.size () every time through the loop:

00B612F0  mov         dword ptr [edx+eax*4],edi  
00B612F3  inc         eax  
00B612F4  mov         ecx,dword ptr [esi+4]            <-- Load vec.end()
00B612F7  mov         edx,dword ptr [esi]              <-- Load vec.begin()
00B612F9  sub         ecx,edx                          <-- ecx = vec.end() - vec.begin()
00B612FB  sar         ecx,2                            <-- exc = (vec.end() - vec.begin()) / sizeof(int)
00B612FE  cmp         eax,ecx  
00B61300  jb          forComputedSize+20h (0B612F0h)  

      

+2


source


I prefer to write my loops as in the first case. In the second case, and std::vector::size()

you can pay for a few additional workloads in the compiler-optimized version, but when you start working with more complex data structures, these simple workloads can become expensive tree searches.

Even with preference, the context sometimes requires you to write your loop in the second form. In the first case, hints that a mutation in the container size does not occur, since the container size is checked once. When you read the second case, the container size is checked every iteration, which tells the user that the body may be mutating the container size.



If you are mutating a container in your loop body, use the second form and comment that you are mutating your container and want to check its size. Otherwise, choose the first one.

+1


source


As noted here , the complexity vector<T>::size

should be constant across all compilers, so it doesn't really matter which one you use.

0


source


In any decent modern C ++ compiler, the two versions won't make any difference from a performance standpoint, because the optimizer will optimize for any impairments. However, I am using the following version:

for (size_t i(0), ie(myVec.size()); i < ie; ++i) {
    // do stuff
}

      

0


source


Getting the size of a vector is always constant.

If your algorithm might be less efficient, you use myVec[i]

for each index. It is going to retrieve the pointer and add "i" to it each time. Pointer arithmetic will probably outperform it for performance, and if you are using vector

iterator

it as it will most likely be implemented as a pointer it will probably outperform your loop.

If you are setting all values ​​to 0

, you can probably trump even that with a single function call, not a loop, in this case

myVec.assign( myVec.size(), 0 );

0


source







All Articles