Comparison performance in C ++ (foo> = 0 vs. foo! = 0)
I recently worked on a piece of code where performance is very important and essentially I have the following situation:
int len = some_very_big_number;
int counter = some_rather_small_number;
for( int i = len; i >= 0; --i ){
while( counter > 0 && costly other stuff here ){
/* do stuff */
--counter;
}
/* do more stuff */
}
So, here I have a loop that runs very often, and for a certain number of runs, the while block will execute until the variable counter
is nullified, and then the while loop will be called because the first expression will be false.
The question now is if there is a performance difference between using
counter > 0
and counter != 0
?
I suspect there will be, does anyone know the details on this.
source to share
In programming, the following statement is a sign for the road to hell:
I recently worked on a piece of code where performance is very important.
Write your code in the cleanest, easiest way to understand. Period.
Once this is done, you can measure the lead time. If it takes too long, measure the bottlenecks and accelerate the biggest ones. Keep doing this until it is fast enough.
The list of projects that have failed or suffered catastrophic losses due to a mistaken focus on blind optimization is large and tragic. Don't join them.
source to share
Is there a difference between counter > 0
and counter != 0
? It depends on the platform.
A very common type of processor is the ones Intel has on our computers. Both comparisons will match the same instruction to that cpu, and I assume they will run at the same speed. However, to be sure, you will have to run your own test.
source to share
As Jim said when you doubt yourself:
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
using namespace boost::posix_time;
using namespace std;
void main()
{
ptime Before = microsec_clock::universal_time(); // UTC NOW
// do stuff here
ptime After = microsec_clock::universal_time(); // UTC NOW
time_duration delta_t = After - Before; // How much time has passed?
cout << delta_t.total_seconds() << endl; // how much seconds total?
cout << delta_t.fractional_seconds() << endl; // how much microseconds total?
}
Here's a pretty neat way to measure time. Hope it helps.
source to share
Ok, you can measure that of course. However, comparisons like these are so quick that you will likely see more options based on processor swaps and scheduling, and then on this separate line of code.
It smells like unnecessary and premature optimization. Your program is correct, optimize what you see. If you need more, profile and then from there.
source to share
I would add that the overwhelming aspect of the performance of this code on a modern processor will not be dominated by the comparison instruction, but whether the comparison is predictable well, since any misprediction will spend many more cycles than any integral operation.
For this unfolding of the cycle is likely to be the biggest winner, but it measures, measures, measures.
source to share
In general, they should be equivalent (both are usually executed in single-cycle instructions / micro-ops). Your compiler might do some weird optimization in a special case that is difficult to reason about from source, which might make one of them a little faster. In addition, equality testing is more energy efficient than inequality testing (>), although the system-level effect is so small that it is not worth discussing.
source to share
Obviously the solution is to use the correct data type.
Make an unsigned int counter. Then it cannot be less than zero. Your compiler will obviously know this and will be forced to choose the optimal solution.
Or you could just measure it.
You can also think about how this will be implemented ... (here we go tangentially) ...
- less than zero: sign bit will be set, so 1 bit needs to be checked
- equal to zero: the whole value will be zero, so all bits must be checked
Of course, computers are funny things, and it can take longer to validate a single bit than an integer value (whatever it is on your platform).
You can just measure it ...
And you might find out that it is more optimal than the other (in the conditions that you measured it). But your program will still run like a dog because you spent all your time optimizing the wrong part of your code.
The best solution is to use what many large software companies do - blame the hardware for not being fast enough and encouraging your customer to upgrade their hardware (which is clearly inferior, since your product is not fast enough).
</strong>
source to share
I stumbled upon this question just now, 3 years after asking it, so I'm not sure how helpful the answer will be, which will still be ... However, I'm surprised not to see clearly stated that the answer to your question requires know two and only two things:
- which processor are you targeting
- which compiler are you working with
First of all, each processor has different instructions for testing. On a given processor, two similar mappings can appear in different cycles. For example, you may have a 1 loop instruction to execute gt (>), eq (==), or le (<=), but no 1 loop instruction for other comparisons like ge (> =). After the test, you can decide to execute conditional statements, or more often, like in your example code, take the jump. Again, hopping cycles take a variable number of cycles on most high-end processors, depending on whether a conditional jump is received or not, not predicted or not predicted. When you are writing code in an assembly and your code is time critical, you can spend quite a bit of time figuring out how to best organize your code.to minimize the total number of loops and may end up in a solution that may need to be optimized based on the amount of time this comparison returns true or false.
Which brings me to the second point: compilers like human coders try to streamline the code to account for the instructions available and their delays. Their work is more difficult because some of the assumptions that the assembly code knows to be "counter small" are difficult (not impossible) to know. For trivial cases like loop counter, most modern compilers can at least recognize that the counter will always be positive and that a! = Will be the same as a>, and therefore generate the best code accordingly. But, like many of those mentioned in the posts, you will only know if you are taking measurements or checking your assembly code and convincing yourself that this is the best thing you could do in an assembly. And when you switch to a new compiler, you might get a different answer.
source to share