Why is the difference between Add12 and Kahan summation algorithm?

Question

Why is the difference between Add12 and Kahan summation algorithm?

Consider the function below Add12

, taken from the CRlibm docs, with a couple of obvious typos corrected:

Let a

and be b

floating point numbers, then the following method calculates two floating point numbers s

and r

, so s

+ r

= a

+ b

exactly, and s

is the floating point number that is closest to a

+ b

.

void Add12Cond ( double *s , double *r, double a, double b ) {
  double z ;
  *s=a+b;
  if (ABS(a) > ABS(b)){
    z=s−a;
    *r=b−z; 
  } else {
    z=s−b;
    *r=a−z;
  }
}

This looks very similar to the Kahan summation algorithm applied to a

and b

, with one obvious difference: the Kahan summation algorithm does not first determine which one is a

or b

has the greatest ABS

. It just takes the terms as they come in (and usually more than two of them).

I think the Floating Point Arithmetic Reference invites the reader to think about this difference, but does not provide any explanation. Anyway, I've been thinking about it for a while, and I still have no intuition as to why the Kahan summation algorithm might end up doing the test ABS(a) > ABS(b)

(and now I don't have a book that reminds me of this question with a recent reference to the Kahan summation algorithm).

+3

floating-point

Pascal Cuoq 07 jan. '15 at 23:00

source to share

1 answer

tmyklebu · Accepted Answer · 2015-01-08T02:24:52+0000

In simpler terms, Kahan summation occurs for a weaker error than Add12

. Summation summation of sums of absolute values of inputs sums up Kahan.

By looking at Add12

, it certainly computes s

how closest to a+b

. The conditional is to make sure it z

computes exactly (casework!), And therefore r

is the "rest" a+b

. In particular, you get r + s = a + b

and |r| <= 0.5 ulp(s)

.

If we took the wrong branch in Add12

, but the values did not differ by more than 2 epsin, z

it would be calculated with an error of no more 0.5 ulp(z)

, and therefore it *r

will be calculated with an error of no more 1 ulp(z)

. Thus, choosing one branch out of two unconditionally means that we accumulate errors proportional to the ulp of what we assumed was less. Kahan summing always assumes the new input is smaller, so it gets a total error roughly proportional to the sum of the absolute values of the inputs.

Kahan, in his original half-page describing Kahan's summation, wrote the following, which conveys things to me that Star Trek couldn't about the wild optimism of the 1960s:

The convenient availability of double precision in many FORTRANs, and some ALGOL compilers indicate that double precision will soon be found universally acceptable as a substitute for ingenuity in solving numerical problems.

Unfortunately, this half-page document does not provide any assessment or evidence. An estimate of the Kahan summation error is provided as Exercise 19 in TAOCP Section 4.2.2; the exercise states that the error arising from the Kahan summation x_1, ..., x_n is bounded (2 epsilon + O (n epsilon ^ 2)) (sum (i = 1..n) | x_i |).

I was going to give proof based on a setting in TAOCP here, but I have repeatedly and embarrassingly tricked it in for some time. Fortunately, I discovered that David Goldberg did this in the appendix to What Every Computer Scientist Should Know About Floating Point Arithmetic.

Why is the difference between Add12 and Kahan summation algorithm?

More articles: