Floating point limit code does not give correct results

I am racking my brains trying to figure out why this code is not giving the correct result. I'm looking for the hexadecimal representations of the positive and negative floating point overflow / underflow levels. The code is based on this site and Wikipedia entry :

7f7f ffff ≈ 3.4028234 × 10 38 (max single precision) - from wikipedia entry, corresponds to positive overflow

Here's the code:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>

using namespace std;

int main(void) {

    float two = 2;
    float twentyThree = 23;
    float one27 = 127;
    float one49 = 149;


    float posOverflow, negOverflow, posUnderflow, negUnderflow;

    posOverflow = two - (pow(two, -twentyThree) * pow(two, one27));
    negOverflow = -(two - (pow(two, one27) * pow(two, one27)));


    negUnderflow = -pow(two, -one49);
    posUnderflow = pow(two, -one49);


    cout << "Positive overflow occurs when value greater than: " << hex << *(int*)&posOverflow << endl;


    cout << "Neg overflow occurs when value less than: " << hex << *(int*)&negOverflow << endl;


    cout << "Positive underflow occurs when value greater than: " << hex << *(int*)&posUnderflow << endl;


    cout << "Neg overflow occurs when value greater than: " << hex << *(int*)&negUnderflow << endl;

}

      

Output:

Positive overflow occurs when the value is greater: f3800000

Overflow overflow occurs when the value is less: 7f800000

Positive 1

overflow occurs when the value is greater: Overflow overflow occurs when the value is greater than:80000001

To get the floating point hexadecimal representation, I use the method described here :

Why is the code not working? I know this will work if positive overflow = 7f7f ffff

.

+3


source to share


3 answers


Your expression for the highest represented positive float is incorrect. The page you link uses (2-pow(2, -23)) * pow(2, 127)

and you have 2 - (pow(2, -23) * pow(2, 127))

. Likewise for the smallest representable negative float.

However, your underflow expressions look correct and the hex outputs are printed for them.



Note that posOverflow

and negOverflow

is just +FLT_MAX

and -FLT_MAX

. But note that your posUnderflow

and negUnderflow

actually less than FLT_MIN

(because they denormalny and FLT_MIN

- the smallest positive normal float).

+3


source


The floating point loses precision as the number increases. The value series 2 127 does not change when 2 is added to it.

Other than that, I am not executing your code. Using words to represent numbers makes it difficult to read.

Here's a standard way to get your device's floating point limits:

#include <limits>
#include <iostream>
#include <iomanip>

std::ostream &show_float( std::ostream &s, float f ) {
    s << f << " = ";
    std::ostream s_hex( s.rdbuf() );
    s_hex << std::hex << std::setfill( '0' );
    for ( char const *c = reinterpret_cast< char const * >( & f );
          c != reinterpret_cast< char const * >( & f + 1 );
          ++ c ) {
        s_hex << std::setw( 2 ) << ( static_cast< unsigned int >( * c ) & 0xff );
    }
    return s;
}

int main() {
    std::cout << std::hex;
    std::cout << "Positive overflow occurs when value greater than: ";
    show_float( std::cout, std::numeric_limits< float >::max() ) << '\n';
    std::cout << "Neg overflow occurs when value less than: ";
    show_float( std::cout, - std::numeric_limits< float >::max() ) << '\n';
    std::cout << "Positive underflow occurs when value less than: ";
    show_float( std::cout, std::numeric_limits< float >::denormal_min() ) << '\n';
    std::cout << "Neg underflow occurs when value greater than: ";
    show_float( std::cout, - std::numeric_limits< float >::min() ) << '\n';
}

      



output:

Positive overflow occurs when value greater than: 3.40282e+38 = ffff7f7f
Neg overflow occurs when value less than: -3.40282e+38 = ffff7fff
Positive underflow occurs when value less than: 1.17549e-38 = 00008000
Neg underflow occurs when value greater than: -1.17549e-38 = 00008080

      

The yield depends on the consistency of the machine. Here bytes are accessed due to little-finite ordering.

Note: "underflow" in this case is not a catastrophic null result, but simply a denormalization that gradually reduces precision. (This can be disastrous for performance.) You can also check numeric_limits< float >::denorm_min()

which creates 1.4013e-45 = 01000000

.

+2


source


Your code assumes integers are the same size as floats (so do everything but a few posts on the page you linked by the way.) You probably want something like strings:

for (size_t s = 0; s < sizeof(myVar); ++s) {
    unsigned char *byte = reinterpret_cast<unsigned char*>(myVar)[s];
    //sth byte is byte
}

      

that is, something similar to the boilerplate code on this page.

Your compiler may not use these IEEE 754 types. You will need to check its documentation.

Also, consider using constants std::numeric_limits<float>.min()

/ max()

or cfloat

FLT_

to define some of these values.

+1


source







All Articles