Why does adding a statement that is never executed cause performance degradation in my code?
Background:
I am writing a stack block and I am working on optimizing it.
I found that adding dreams after skipping compare_and_exchange operations results in significantly higher throughput when testing in highly parallel scenarios:
void stack::push(node* n)
{
node old_head, new_head{ n };
n->n_ = nullptr;
if (head_.compare_exchange_weak(old_head, new_head))
return;
for (;;)
{
n->n_ = old_head.n_;
new_head.create_id(old_head);
if (head_.compare_exchange_weak(old_head, new_head))
return;
// testing conditions _never_ reach here, so why does this line make the program slower??
std::this_thread::sleep_for(std::chrono::nanoseconds(5));
// debug break is used to confirm execution never reaches here
__debugbreak();
}
}
(The complete code can be found here on GitHub.)
I sleep after compare_exchange fails twice - the first is actually load (), except when the stack is empty. Sounds good? This is an easy optimization. But...
Here's what I didn't expect:
Adding sleep code results in a significant reduction in throughput in scripts that will never run sleep code! This is confirmed by the addition of __debugbreak.
Examples of numbers:
test conditions:
----------------------
data_count = 1
loop_count = 100000000
thread_count = 1
sleep code commented out
-------------------------------
operations per second: 75357000
operations per second: 74487000
operations per second: 74571000
operations per second: 75357000
operations per second: 75843000
operations per second: 74183000
operations per second: 74822000
operations per second: 74321000
operations per second: 75301000
operations per second: 73991000
with sleep code
-------------------------------
operations per second: 60716000
operations per second: 61031000
operations per second: 61236000
operations per second: 60957000
operations per second: 60808000
operations per second: 60642000
operations per second: 60734000
operations per second: 60661000
operations per second: 60422000
operations per second: 61162000
This was the latest version of Xcode 5. I see a similar difference in numbers when using Visual Studio 2013.
So what's going on here? Why does the code show significantly smaller numbers when adding something that never gets executed?
source to share
Adding sleep adds another branch. Without sleep, there will be a jump back to the top of the loop if compare_exchange_weak is false. With sleep, there will be a branch of the epilogue function if compare_exchange_weak is true, and an unconditional jump back to the top of the loop after sleep.
source to share