Converting large integer to float

I'm trying to convert an integer to float like this (simplified):

int64_t x = -((int64_t)1 << 63);
float y = x;

      

With MSVC 2013 on 64-bit Windows 7 this works fine, but with gcc 4.8 on Ubuntu 14.04 64-bit I get a positive value for x. I turned off all optimizations and looked at the variables in gdb. I even tried evaluating with gdb directly to find the cause of the problem:

(gdb) print (float)(-((int64_t)1 << 63))
$33 = 9,22337204e+18

(gdb) print (float)(-9223372036854775808)
$39 = 9,22337204e+18

      

As you can see, even adding explicit casts solves the problem. I am a little confused as I float

should be able to hold much larger numbers (in absolute value). sizeof(float) == 4

and sizeof(size_t) == 8

in case it matters. The value -2 ^ 63 seems to be some magic limit, since -2 ^ 63 + 1 converts perfectly fine:

(gdb) print (float)(-((int64_t)1 << 63) + 1)
$44 = -9,22337149e+18

      

What is the reason why the sign is lost when converting for values ​​<= - 2 ^ 63? The value -2 ^ 63 can be represented by either int64_t or float; and it works on other platforms as described above.

+3


source to share


2 answers


The instruction (int64_t)1 << 63

shifts a 1

to the sign bit, so this is Undefined Behavior.

Even if the shift was successful and gave 0x8000000000000000

, it is the minimum (and negative) value that can be maintained, and then negate the value with



-((int64_t)1 << 63)

      

places a value outside the range of a positive signed 64-bit int

.

+5


source


To avoid undefined The behavior, multiply by two arguments using standard functionldexp

: - ldexp(1.0, 63)

.



+2


source







All Articles