Converting large integer to float

Question

Converting large integer to float

I'm trying to convert an integer to float like this (simplified):

int64_t x = -((int64_t)1 << 63);
float y = x;

With MSVC 2013 on 64-bit Windows 7 this works fine, but with gcc 4.8 on Ubuntu 14.04 64-bit I get a positive value for x. I turned off all optimizations and looked at the variables in gdb. I even tried evaluating with gdb directly to find the cause of the problem:

(gdb) print (float)(-((int64_t)1 << 63))
$33 = 9,22337204e+18

(gdb) print (float)(-9223372036854775808)
$39 = 9,22337204e+18

As you can see, even adding explicit casts solves the problem. I am a little confused as I float

should be able to hold much larger numbers (in absolute value). sizeof(float) == 4

and sizeof(size_t) == 8

in case it matters. The value -2 ^ 63 seems to be some magic limit, since -2 ^ 63 + 1 converts perfectly fine:

(gdb) print (float)(-((int64_t)1 << 63) + 1)
$44 = -9,22337149e+18

What is the reason why the sign is lost when converting for values <= - 2 ^ 63? The value -2 ^ 63 can be represented by either int64_t or float; and it works on other platforms as described above.

+3

c floating-point type-conversion integer

Andreas Unterweger 06 May '15 at 14:09

source to share

2 answers

Weather vane · Answer 1 · 2015-05-06T14:19:20+0000

The instruction (int64_t)1 << 63

shifts a 1

to the sign bit, so this is Undefined Behavior.

Even if the shift was successful and gave 0x8000000000000000

, it is the minimum (and negative) value that can be maintained, and then negate the value with

-((int64_t)1 << 63)

places a value outside the range of a positive signed 64-bit int

.

Pascal Cuoq · Answer 2 · 2015-05-06T14:30:35+0000

To avoid undefined The behavior, multiply by two arguments using standard functionldexp

: - ldexp(1.0, 63)

.

Converting large integer to float

More articles: