Smart way to do computations near int overflow

Is there any clever way to fix this problem?

uint32_t a = 16637510;
uint32_t b = 45627362;
uint32_t c = 0;
c = a * 100000 / b //overflows
c = (a * 100/b)*1000 //gives 36000

      

I need to get the result c = 36463 or better 36464. And fast, non-floating operations are needed. CPU - stm32f4

Update:

The accepted answer converts 100000 to 100000ULL (64 bit), but as @PeterJ suggested (and removed his answer) using stm32f4 FPU is faster than 64 split operations

Timer t;
int i;
t.start();
for(i = 1; i <= 100000; ++i) c = a * 100000ULL / b;
t.stop();
printf("64\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();
t.start();
for(i = 1; i <= 100000; ++i) c = (uint32_t)((float)a * 100000.0f / (float)b);
t.stop();
printf("float\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();

      

64 takes 0.086669 seconds, du - 57333
float takes 0.017779 seconds, du - 57333

+3


source to share


2 answers


How about this?

c = a * 100000ULL / b; // gives 36463

      



See https://godbolt.org/g/aemCyw for the build that generates gcc for this operation and the original c = a * 100000 / b

one that overflows. Note that instead is __aeabi_uidiv

used __aeabi_uldivmod

.

+1


source


When 64-bit math is expensive, sometimes a 32-bit approximate solution can be significantly faster. Depends on the processor / compiler.

Let's see what can be done using only 32-bit math.


b == 100000 == 0x186A0

and suppose it is fixed - a 17-bit number.

a == 16637510 == 0x00FDDE46

but the OP says it is in the +/- 1000 range. So it is a 24-bit number. b

is a 26-bit number. With these constraints, the final factor will always be around 36464 (16-bit number)



We can scale the product operands a,b

to use 16 or more significant bits a

and 16 or so most significant bits b

without losing significant value. Then we have a 16-bit * 16-bit product that won't overwhelm 32-bit math.

We can take advantage of the fact that b

it only has 12 significant bits, leaving the code to use up to 20 (32-12) of the most significant bits of a 24-bit a

in the product.

The intermediate is 41 bits, so we need to scale the multiplication at least 9 bits.

#define SCALE_A 4
#define SCALE_M 5
// Insure SCALE_A + SCALE_M >= 9 to avoid overflow
// Perhaps other scales like SCALE_A 8, SCALE_M 1 will be faster.

uint32_t scale(uint32_t a, uint32_t b) {
  uint32_t product = (a >> SCALE_A)*(100000 >> SCALE_M);
  uint32_t c = product/(b >> (SCALE_A + SCALE_M));
  return c;
}

      

If it's faster / better for the OP? May be. Just a different approach to consideration. I'll leave it for users to connect to for performance profiling.

0


source







All Articles