What is the error of trigonometric instructions on x86?

Where can I find information on error ranges for trigonometric function instructions on x86 processors?

+2


source to share


3 answers


What you ask is rarely an interesting question, and chances are you really want to know something else. So let me first answer different questions:

How do I calculate a trigonometric function with a specific precision?

Just use a longer data type. If x86, if you want a double precision result, do 80 bit extended double calculation and you are safe.



How do I get platform independent accuracy?

To do this, you need a specialized software solution such as MPFR

So let me go back to your original question. Short answer: for small operands, it should usually be within 1 ulp. For large operands, it gets worse. The only way to know for sure is to check it out for yourself, like this guy did . No reliable information from processor vendors.

+5


source


You can read the Intel® 64 and IA-32 Developer's Guide: Vol. 1 , Section 8.3.10 "Accuracy of Transcendental Instruction". There is a precise formula, but also a more accessible statement



With Pentium and later IA-32 processors, worst-case error for transcendental functions is less than 1 ulp when rounded to nearest (even) and less than 1.5 ulps when rounding in other modes.

+2


source


For Intel processors, the accuracy of the embedded transcendental instructions is documented in the Intel® 64 and IA-32 Software Developers Guide, Volume 1 , Section 8.3.10 Accuracy of the Transcendental Instruction:

With Pentium and later IA-32 processors, worst-case error for transcendental functions is less than 1 ulp when rounded to nearest (even) and less than 1.5 ulps when rounding in other modes.

It should be noted that an error of 1 ulp applies to 80-bit extended-precision formatting as all transcendental function instructions provide high-fidelity results. The issue noted by Stephen Cannon in an earlier comment regarding loss of precision, regarding the mathematical reference, for trigonometric function instructions FSIN, FCOS, FSCINCOS, FPTAN, due to the reduction of arguments with a 66-bit PI machine, is confirmed by Intel. The guide is provided as follows:

Regardless of the target precision (single, double, or double extended), it is safe to reduce the argument to a value less in absolute value than about 3π / 4 for FSIN, and less than about 3π / 8 for FCOS, FSINCOS, and FPTAN. [...] For example, precision measurements show that a double FSIN result will not have errors greater than 0.72 ulp for | x | <2.82 [...] Similarly, an FCOS double extended precision result will not have errors greater than 0.82 ulp for | x | <1.31 [...]

It is further recognized that an error estimate of 1 ulp for the log function commands FYL2X and FYL2XP1 is only executed when y = 1 (this was incomprehensible in some outdated Intel documentation):

The FYL2X and FYL2XP1 instructions are two operand instructions and are guaranteed to be within 1 ulp only when y is 1. When y is not 1, the maximum ulp error is always within 1.35

Using a library with multiple points, it's not hard to put Intel's claims to the test. To collect the following data, I used Richard Brent's MP library as a reference and ran 2 random test cases 2 31 at the intervals indicated:

Intel Xeon CPU E3-1270 v2 "IvyBridge", Intel64 Family 6 Model 58 Stepping 9, GenuineIntel

2xm1 [-1,1]        max. ulp = 0.898306 at x = -1.8920e-001 (BFFC C1BED062 C071D472)
sin [-2.82,+2.82]  max. ulp = 0.706783 at x =  5.1323e-001 (3FFE 8362D6B1 FC93DFA0)
cos [-1.41,+1.41]  max. ulp = 0.821634 at x = -1.3201e+000 (BFFF A8F8486E 591A59D7)
tan [-1.41,+1.41]  max. ulp = 0.990388 at x =  1.3179e+000 (3FFF A8B0CAB9 0039C790)
atan [-1,1]        max. ulp = 0.747328 at x =  1.2252e-002 (3FF8 C8BB9E06 B9EB4DF8), y =  3.9204e-001 (3FFD C8B8DC94 AA6655B4)
y2lx [0.5,2.0]     max. ulp = 0.994396 at x =  1.0218e+000 (3FFF 82C95B56 8A70EB2D), y =  1.0000e+000 (3FFF 80000000 00000000)
yl2x [1.0,1.2]     max. ulp = 1.202769 at x =  1.0915e+000 (3FFF 8BB70F1B C5F7E103), y = -9.8934e-001 (BFFE FD453A23 AC926478)
yl2xp1 [-0.7,1.44] max. ulp = 0.990469 at x =  2.1709e-002 (3FF9 B1D61A98 BF349080), y =  1.0000e+000 (3FFF 80000000 00000000)
yl2xp1 [-1, 1]     max. ulp = 1.206979 at x =  9.1169e-002 (3FFB BAB69127 C1D5C158), y = -9.9281e-001 (BFFE FE28A91F 132F0C35)

      

While such non-exhaustive testing cannot prove the boundaries of the errors, the maximum errors found confirm Intel documentation.

I don't have modern AMD processors to test, but they have test data for an older 32-bit Athlon processor. Full disclosure: I have developed algorithms for transcendental function instructions used in 32-bit Athlon processors. My target was less than 1 ulp for all instructions; however, the same caveat on argument reduction on the 66-bit PI machine for the trigonometric functions already mentioned above applies.

Athlon XP-2100 "Palomino", x86 Family 6 Model 6 Stepping 2, AuthenticAMD

2xm1 [-1,1]        max. ulp = 0.720006 at x =  5.6271e-001 (3FFE 900D9E90 A533535D)
sin [-2.82, +2.82] max. ulp = 0.663069 at x = -2.8200e+000 (C000 B47A7BB2 305631FE)
cos [-1.41, +1.41] max. ulp = 0.671089 at x = -1.3189e+000 (BFFF A8D0CF9E DC0BCA43)
tan [-1.41, +1.41] max. ulp = 0.783821 at x = -1.3225e+000 (BFFF A947067E E3F4C39C)
atan [-1,1]        max. ulp = 0.665893 at x =  5.5333e-001 (3FFE 8DA6B606 C58B206A) y =  5.5169e-001 (3FFE 8D3B9DC8 5EA87546)
yl2x [0.4,2.5]     max. ulp = 0.716276 at x =  6.9826e-001 (3FFE B2C128C3 0EF1EC00) y = -1.2062e-001 (BFFB F7064049 BC362838)
yl2xp1 [-1,4]      max. ulp = 0.691403 at x =  1.9090e-001 (3FFC C37C0397 F8184934) y = -2.4796e-001 (BFFC FDE93CA9 980BF78C)

      

AMD64 Architecture Programming Guide, vol. 1 , Section 6.4.5.1 Accuracy of Transcendental Results documents the error margins as follows:

Calculations

x87 are executed in double extended precision format, so that transcendental functions provide one-last-place (ulp) results for each of the floating-point data types.

+2


source







All Articles