Converting large integer to float
I'm trying to convert an integer to float like this (simplified):
int64_t x = -((int64_t)1 << 63);
float y = x;
With MSVC 2013 on 64-bit Windows 7 this works fine, but with gcc 4.8 on Ubuntu 14.04 64-bit I get a positive value for x. I turned off all optimizations and looked at the variables in gdb. I even tried evaluating with gdb directly to find the cause of the problem:
(gdb) print (float)(-((int64_t)1 << 63))
$33 = 9,22337204e+18
(gdb) print (float)(-9223372036854775808)
$39 = 9,22337204e+18
As you can see, even adding explicit casts solves the problem. I am a little confused as I float
should be able to hold much larger numbers (in absolute value). sizeof(float) == 4
and sizeof(size_t) == 8
in case it matters. The value -2 ^ 63 seems to be some magic limit, since -2 ^ 63 + 1 converts perfectly fine:
(gdb) print (float)(-((int64_t)1 << 63) + 1)
$44 = -9,22337149e+18
What is the reason why the sign is lost when converting for values <= - 2 ^ 63? The value -2 ^ 63 can be represented by either int64_t or float; and it works on other platforms as described above.
source to share
The instruction (int64_t)1 << 63
shifts a 1
to the sign bit, so this is Undefined Behavior.
Even if the shift was successful and gave 0x8000000000000000
, it is the minimum (and negative) value that can be maintained, and then negate the value with
-((int64_t)1 << 63)
places a value outside the range of a positive signed 64-bit int
.
source to share
To avoid undefined The behavior, multiply by two arguments using standard functionldexp
: - ldexp(1.0, 63)
.
source to share