Creating a pandas dataframe with a float64 dtype changes the last digit of its entry (quite a large number)

I tried to create pandas framework like below

import pandas as pd
import numpy as np

pd.set_option('precision', 20)

a = pd.DataFrame([10212764634169927, 10212764634169927, 10212764634169927], columns=['counts'], dtype=np.float64)

      

a is returned as:

             counts
0  10212764634169928.0
1  10212764634169928.0
2  10212764634169928.0

      

So my question is, why is the last digit changed?

Thanks in advance!

EDIT: I understand it has something to do with the dtype. But why +1 to the last digit specifically? If I used 10212764634169926 instead, nothing happens, the results are saved to 10212764634169926. The same happens with 10212764634169928, it returns 10212764634169928

+3


source to share


1 answer


The problem is not related to itself pandas

, but to the number float

. If you try the following:

float(10212764634169927)
1.0212764634169928e+16

      

you can have an idea of โ€‹โ€‹how floating point numbers are stored in memory (via exponential notation) (see last decimal). To take a little look at this problem, I tested the following:



a.astype('float64')
                counts
0  10212764634169928.0
1  10212764634169928.0
2  10212764634169928.0

a.astype('float32')
                counts
0  10212764362473472.0
1  10212764362473472.0
2  10212764362473472.0

      

You can see that using the format float32

will make an even bigger difference.

+4


source







All Articles