Re-appending a floating point number
I'm looking for some legacy code that defines some grid points along an axis f
.
int main(){
double f[9];
int i = 0;
for (double a = 0; a <= 1; a += 0.125){
f[i++] = a;
}
}
I am worried about re-adding from 1/8 to a
and the loop is not working properly. This is because I don't think you can add floating point values ββlike this and rely on it being 1 when it i
is 8.
Or is this code OK and should I stop worrying? (The code is at least 20 years old, apparently, and never caused problems - although it double a
was declared outside the loop in the original version , I read why it did.)
source to share
The code is good when compiled with a C compiler that provides either exactly the IEEE 754 semantics or a close approximation of them (say FLT_EVAL_METHOD> 0, or even arbitrary redundant precision for subexpressions and arbitrary rounds to nominal precision).
Problems you are afraid of:
- representation error where
0.125
1/8 is not exactly, and - work error where is
+
not exactly mathematical complement. None of this happens for this particular program.
0.125 requires 1 bit of precision to be represented exactly in base 2. This means that the floating point number used by the program under these conditions is exactly 1/8. Moreover, it can be added to itself 2 53 times before any approximation is added.
This reasoning is incorrect for the other steps of the step. For example, changing your program below leaves one array index f[100]
not initialized at least by my compiler (which implements strict IEEE 754 semantics):
int main(){
double f[101];
int i = 0;
for (double a = 0; a <= 1; a += 0.01){
printf("%.16e %d\n", a, i);
f[i++] = a;
}
}
When I run it I get on the last lines:
... 9.8000000000000065e-01 98 9.9000000000000066e-01 99
f[100]
is never written due to presentation and performance errors when trying to 0.01
re- append to itself in binary floating point.
source to share
C standard leaves floating number implemented behavior , many of the compilers use the IEEE 754 standard. Your code doesn't work because:
- Your code should compile using the IEEE 754 standard or similar with the same behavior.
- Difficult to read. When you read the code, the exact behavior is not obvious.
- This is the wrong way to iterate over an array in C.
Imagine someone does not know this requirement and does not compile using the IEEE 754 standard. The behavior could be completely different. For example, your code might push out of bounds and this behavior is undefined, your program might crash.
However, a sample code that has the same behavior as your code when compiled against the IEEE 754 standard:
#include <stddef.h>
int main(void) {
double f[9];
size_t const size_f = sizeof f / sizeof *f;
double a = 0;
for (size_t i = 0; i < size_f; i++) {
f[i] = a;
a += 0.125;
}
}
source to share