Creating a pandas dataframe with a float64 dtype changes the last digit of its entry (quite a large number)

Question

Creating a pandas dataframe with a float64 dtype changes the last digit of its entry (quite a large number)

I tried to create pandas framework like below

import pandas as pd
import numpy as np

pd.set_option('precision', 20)

a = pd.DataFrame([10212764634169927, 10212764634169927, 10212764634169927], columns=['counts'], dtype=np.float64)

a is returned as:

             counts
0  10212764634169928.0
1  10212764634169928.0
2  10212764634169928.0

So my question is, why is the last digit changed?

Thanks in advance!

EDIT: I understand it has something to do with the dtype. But why +1 to the last digit specifically? If I used 10212764634169926 instead, nothing happens, the results are saved to 10212764634169926. The same happens with 10212764634169928, it returns 10212764634169928

+3

python numpy pandas data-science

snowflake 08 May '17 at 15:45

source to share

1 answer

Eric B · Answer 1 · 2017-05-08T18:29:27+0000

The problem is not related to itself pandas

, but to the number float

. If you try the following:

float(10212764634169927)
1.0212764634169928e+16

you can have an idea of how floating point numbers are stored in memory (via exponential notation) (see last decimal). To take a little look at this problem, I tested the following:

a.astype('float64')
                counts
0  10212764634169928.0
1  10212764634169928.0
2  10212764634169928.0

a.astype('float32')
                counts
0  10212764362473472.0
1  10212764362473472.0
2  10212764362473472.0

You can see that using the format float32

will make an even bigger difference.

Creating a pandas dataframe with a float64 dtype changes the last digit of its entry (quite a large number)

More articles: