Smart way to do computations near int overflow
Is there any clever way to fix this problem?
uint32_t a = 16637510;
uint32_t b = 45627362;
uint32_t c = 0;
c = a * 100000 / b //overflows
c = (a * 100/b)*1000 //gives 36000
I need to get the result c = 36463 or better 36464. And fast, non-floating operations are needed. CPU - stm32f4
Update:
The accepted answer converts 100000 to 100000ULL (64 bit), but as @PeterJ suggested (and removed his answer) using stm32f4 FPU is faster than 64 split operations
Timer t;
int i;
t.start();
for(i = 1; i <= 100000; ++i) c = a * 100000ULL / b;
t.stop();
printf("64\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();
t.start();
for(i = 1; i <= 100000; ++i) c = (uint32_t)((float)a * 100000.0f / (float)b);
t.stop();
printf("float\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();
64 takes 0.086669 seconds, du - 57333
float takes 0.017779 seconds, du - 57333
source to share
How about this?
c = a * 100000ULL / b; // gives 36463
See https://godbolt.org/g/aemCyw for the build that generates gcc for this operation and the original c = a * 100000 / b
one that overflows. Note that instead is __aeabi_uidiv
used __aeabi_uldivmod
.
source to share
When 64-bit math is expensive, sometimes a 32-bit approximate solution can be significantly faster. Depends on the processor / compiler.
Let's see what can be done using only 32-bit math.
b == 100000 == 0x186A0
and suppose it is fixed - a 17-bit number.
a == 16637510 == 0x00FDDE46
but the OP says it is in the +/- 1000 range. So it is a 24-bit number. b
is a 26-bit number. With these constraints, the final factor will always be around 36464 (16-bit number)
We can scale the product operands a,b
to use 16 or more significant bits a
and 16 or so most significant bits b
without losing significant value. Then we have a 16-bit * 16-bit product that won't overwhelm 32-bit math.
We can take advantage of the fact that b
it only has 12 significant bits, leaving the code to use up to 20 (32-12) of the most significant bits of a 24-bit a
in the product.
The intermediate is 41 bits, so we need to scale the multiplication at least 9 bits.
#define SCALE_A 4
#define SCALE_M 5
// Insure SCALE_A + SCALE_M >= 9 to avoid overflow
// Perhaps other scales like SCALE_A 8, SCALE_M 1 will be faster.
uint32_t scale(uint32_t a, uint32_t b) {
uint32_t product = (a >> SCALE_A)*(100000 >> SCALE_M);
uint32_t c = product/(b >> (SCALE_A + SCALE_M));
return c;
}
If it's faster / better for the OP? May be. Just a different approach to consideration. I'll leave it for users to connect to for performance profiling.
source to share