X86: latency and throughput of transcendental functions
The Intel® 64 and IA-32 Architecture Optimization Reference Guide provides latency and throughput data for various CPU instructions.
For transcendental functions ( FSIN
etc.), some numbers are listed as ranges (page C-29). Footnote 4 explains:
The latency and throughput of transcendental commands can vary significantly in a dynamic execution environment. For these instructions, only an approximate value or range of values is given.
My question is, what factors affect the throughput and latency of such instructions? I believe the meaning of the argument is one of the factors. Are there others?
source to share
Apart from the argument, a combination of other instructions that are in flight can affect latency and throughput. These instructions are micro-coded, which means they generate a sequence of μops that must fight other instructions for ALU resources; in the event of such disagreement, performance may be negatively performed.
source to share
The x87 control word determines the precision of the computation (64-bit, 53-bit, or 24-bit mantissa) and can affect the performance of transcendental functions, especially those that use an internal or square root. In general, I advise you to avoid using x87 trigonometric instructions because by design they are very imprecise for large input values.
source to share