X86: latency and throughput of transcendental functions

The Intel® 64 and IA-32 Architecture Optimization Reference Guide provides latency and throughput data for various CPU instructions.

For transcendental functions ( FSIN

etc.), some numbers are listed as ranges (page C-29). Footnote 4 explains:

The latency and throughput of transcendental commands can vary significantly in a dynamic execution environment. For these instructions, only an approximate value or range of values ​​is given.

My question is, what factors affect the throughput and latency of such instructions? I believe the meaning of the argument is one of the factors. Are there others?

+3


source to share


2 answers


Apart from the argument, a combination of other instructions that are in flight can affect latency and throughput. These instructions are micro-coded, which means they generate a sequence of μops that must fight other instructions for ALU resources; in the event of such disagreement, performance may be negatively performed.



+4


source


The x87 control word determines the precision of the computation (64-bit, 53-bit, or 24-bit mantissa) and can affect the performance of transcendental functions, especially those that use an internal or square root. In general, I advise you to avoid using x87 trigonometric instructions because by design they are very imprecise for large input values.



+2


source







All Articles