Compiler optimization for a temporary object used as an argument to the "const &" function in a loop?
I have a forever thread loop below the call std::this_thread::sleep_for
for a 10ms delay. Duration is a temporary object std::chrono::milliseconds(10)
. Call delay seems "normal" and "typical" after some example code. Looking a little closer, however, it is obvious that in each loop, a temporal object is created and destroyed once.
// Loop A.
for (;;)
{
std::this_thread::sleep_for(std::chrono::milliseconds(10));
// Do something.
}
Now, if a duration object is created outside of the loop (as a constant), it will only be built once for all loops. See code below.
// Loop B.
const auto t = std::chrono::milliseconds(10);
for (;;)
{
std::this_thread::sleep_for(t);
// Do something.
}
Question: Since std :: this_thread :: sleep_for uses "const &" as its argument type, would any C ++ compiler optimize a temp duration object inside Loop A into something like Loop B?
I tried the simple test program below. The result shows that VC ++ 2013 does not optimize the "const &" temporary object.
#include <iostream>
#include <thread>
using namespace std;
class A {
public:
A() { cout << "Ctor.\n"; }
void ReadOnly() const {} // Read-only method.
};
static void Foo(const A & a)
{
a.ReadOnly();
}
int main()
{
cout << "Temp object:\n";
for (int i = 0; i < 3; ++i)
{
Foo(A());
}
cout << "Optimized:\n";
const auto ca = A();
for (int i = 0; i < 3; ++i)
{
Foo(ca);
}
}
/* VC2013 Output:
Temp object:
Ctor.
Ctor.
Ctor.
Optimized:
Ctor.
*/
source to share
MSVC and other modern compilers are great at optimizing temporary objects in loops.
The problem in your example is that you have a side effect in the constructor. According to the C ++ standard, the compiler is not allowed to optimize the creation / destruction of your temp object as it will no longer reproduce the same observable effects (i.e. printing 3 times).
The picture is completely different if you are no longer cout
. Of course, you will need to look at the assembler code generated to test the optimization.
Example:
class A {
public:
static int k;
A() { k++; }
void ReadOnly() const {} // Read-only method.
};
int A::k = 0;
// Foo unchanged
int main()
{
for(int i = 0; i < 3; ++i)
Foo(A()); // k++ is a side effect, but not yet observable
volatile int x = A::k; // volatile can't be optimized away
const auto ca = A();
for(int i = 0; i < 3; ++i)
Foo(ca);
x = A::k; // volatile can't be optimized away
cout << x << endl;
}
The optimizer noticed perfectly well that it has the same static variable that is increasing, that it is not used elsewhere. So, here's the assembler code generated (extracted):
mov eax, DWORD PTR ?k@A@@2HA ; A::k <=== load K
add eax, 3 <=== add 3 to it (NO LOOP !!!)
mov DWORD PTR ?k@A@@2HA, eax ; A::k <=== store k
mov DWORD PTR _x$[ebp], eax <=== store a copy in x
inc eax <=== increment k
<=== (no loop since function doesn't perform anything)
mov DWORD PTR ?k@A@@2HA, eax ; A::k <=== store it
mov DWORD PTR _x$[ebp], eax <=== copy it to x
Of course you need to compile in release mode.
As you can see, the compiler is very smart. So let him do his job, concentrate on developing the code, and keep in mind: premature optimization is the root of all evil ;-)
source to share
Assuming that the compiler "understands" what the constructor is doing (in other words, has the source code for the constructor available in the translation unit, that is, the source file or one of the header files) contains a definition for that constructor), then the compiler should remove unnecessary calls a constructor that has no side effects.
Since printing something is a very definite side effect of your constructor A
, the compiler clearly cannot optimize it. So the compiler is doing exactly the "right" thing here. It would be very bad if you had, for example, a lock constructor that then releases the lock in the destructor, and the compiler decided to optimize yours:
for(...)
{
LockWrapper lock_it(theLock);
... some code here
}
outside the loop, because although the overhead of unlocking and releasing a lock is lower, the semantics of the code changes and the duration of the lock is potentially much longer, which affects OTHER code using the same locks, for example, in a different thread.
source to share