Tan () is twice as long as sin () / cos () with g ++ 4.8.2

I work with algorithms that use a lot of math functions and we recently ported the code to g ++ 4.8.2 on an Ubuntu system from the Solaris platform.

Surprisingly, some of the algorithms took longer than before. The reason is that the function std::tan()

is twice as long as std::sin()/std::cos()

.

Replacing tan with sin / cos has significantly reduced the computation time for the same results. I wonder why there is such a difference. Is it because of the implementation of the standard library? Should the tanning function be more effective?

I wrote a program to check the timing of functions:

#include <cmath>
#include <iostream>
#include <chrono>

int main(int argc, char * argv[])
{
    using namespace std::chrono;

    auto start_tan = system_clock::now();

    for (int i = 0; i < 50000; ++i)
    {
        const double & a = static_cast<double>(i);
        const double & b = std::tan(a);
    }

    auto end_tan = system_clock::now();
    auto elapsed_time_tan = end_tan - start_tan;
    std::cout << "tan : ";
    std::cout << elapsed_time_tan.count() << std::endl;

    auto start_sincos = system_clock::now();

    for (int i =  0; i < 50000; ++i)
    {
        const double & a = static_cast<double>(i);
        const double & b = std::sin(a) / std::cos(a);
    }

    auto end_sincos = system_clock::now();
    auto elapsed_time_sincos = end_sincos - start_sincos;
    std::cout << "sincos : " << elapsed_time_sincos.count() << std::endl;

}

      

And indeed, in the output I have the following time without optimization:

tan : 8319960
sincos : 4736988

      

And with optimization (-O2):

tan : 294
sincos : 120

      

If anyone has any idea about this behavior.

EDIT

I modified the program as per @Basile Starynkevitch answer:

#include <cmath>
#include <iostream>
#include <chrono>

int main(int argc, char * argv[])
{
    using namespace std::chrono;

   if (argc != 2) 
   {
      std::cout << "Need one and only argument : the number of iteration." << std::endl;
      return 1;
   }

   int nb_iter = std::atoi(argv[1]);
   std::cout << "Number of iteration programmed : " << nb_iter << std::endl;


   double tan_sum = 0.0;
   auto start_tan = system_clock::now();
    for (int i = 0; i < nb_iter; ++i)
    {
        const double & a = static_cast<double>(i);
        const double b = std::tan(a);
      tan_sum += b;
    }

    auto end_tan = system_clock::now();
    auto elapsed_time_tan = end_tan - start_tan;
    std::cout << "tan : " << elapsed_time_tan.count() << std::endl;
   std::cout << "tan sum : " << tan_sum << std::endl;

   double sincos_sum = 0.0;
    auto start_sincos = system_clock::now();
    for (int i =  0; i < nb_iter; ++i)
    {
        const double & a = static_cast<double>(i);
        const double b = std::sin(a) / std::cos(a);
      sincos_sum += b;
    }

    auto end_sincos = system_clock::now();
    auto elapsed_time_sincos = end_sincos - start_sincos;
    std::cout << "sincos : " << elapsed_time_sincos.count() << std::endl;
   std::cout << "sincos sum : " << sincos_sum << std::endl;

}

      

And now as a result I get a similar time only for -O2

:

tan : 8345021
sincos : 7838740

      

But still the difference with -O2 -mtune=native

, but faster:

tan : 5426201
sincos : 3721938

      

I will not be a user -ffast-math

because I need to comply with IEEE compliance.

+3


source to share


2 answers


You don't have to worry about non-optimized code.

As far as optimizations go, the GCC compiler is probably throwing away the loop since you don't do anything with the result. BTW b

shouldn't be a link const double&

, but a const double

.

If you want a meaningful test, try saving b

(or summarizing). And make the number of iterations (50,000) a runtime parameter (for example int nbiter = (argc>1)?atoi(argv[1]):1000;

)

You might want to pass -O2 -ffast-math -mtune=native

as optimization flags to g++

(beware that -ffast-math

doesn't follow the standard in optimization details)

With this flag a with my changes:

double sumtan=0.0, sumsincos=0.0;
int nbiter = argc>1?atoi(argv[1]):10000;

      

and



for (int i = 0; i < nbiter; ++i)
{
    const double & a = static_cast<double>(i);
    const double  b = std::tan(a);
    sumtan += b;
}

      

and



for (int i =  0; i < nbiter; ++i)
{
    const double & a = static_cast<double>(i);
    const double  b = std::sin(a) / std::cos(a);
    sumsincos += b;
}

      

and



std::cout << "tan : "  << elapsed_time_tan.count() 
          << " sumtan=" << sumtan << std::endl;

      

and



std::cout << "sincos : " << elapsed_time_sincos.count() 
          << " sumsincos=" << sumsincos << std::endl;

      

compiled with GCC 4.9.2 using

 g++ -std=c++11 -O2 -Wall -ffast-math -mtune=native b.cc -o b.bin

      

I get pretty similar timings:

  % ./b.bin 1000000
  tan : 77158579 sumtan=-3.42432e+06
  sincos : 70219657 sumsincos=-3.42432e+06

      

this is on a 4 year old desktop (Intel (R) Xeon (R) CPU X3430 @ 2.40 GHz)

If compiling with clang++

3.5.0

tan : 78098229 sumtan=-3.42432e+06
sincos : 106817614 sumsincos=-3.42432e+06

      

PS. Timing (and relative performance) differs from -O3

. Some processors have machine instructions for sin

, cos

and tan

, but they may not be used (because the compiler or libm

knows they are slower than a subroutine). GCC has builtins for these.

+8


source


Check out the Intel Developer Guide. trigger functions are not as accurate as other math functions on x86, so sin / cos will not give the same result as tan, which you must keep in mind if IEEE compliance is the reason for your query.

In terms of speedup, sin and cos can be obtained from the same instruction, if the compiler is not brain dead. Calculating tan with the same precision is a lot of work. So the compiler cannot replace sin / cos without breaking the standard.



Depending on whether those last decimal places are important to you or not, you may need to look at What is the error of trigonometric instructions on x86?

+2


source







All Articles