Creating a pandas dataframe with a float64 dtype changes the last digit of its entry (quite a large number)
I tried to create pandas framework like below
import pandas as pd
import numpy as np
pd.set_option('precision', 20)
a = pd.DataFrame([10212764634169927, 10212764634169927, 10212764634169927], columns=['counts'], dtype=np.float64)
a is returned as:
counts
0 10212764634169928.0
1 10212764634169928.0
2 10212764634169928.0
So my question is, why is the last digit changed?
Thanks in advance!
EDIT: I understand it has something to do with the dtype. But why +1 to the last digit specifically? If I used 10212764634169926 instead, nothing happens, the results are saved to 10212764634169926. The same happens with 10212764634169928, it returns 10212764634169928
The problem is not related to itself pandas
, but to the number float
. If you try the following:
float(10212764634169927)
1.0212764634169928e+16
you can have an idea of โโhow floating point numbers are stored in memory (via exponential notation) (see last decimal). To take a little look at this problem, I tested the following:
a.astype('float64')
counts
0 10212764634169928.0
1 10212764634169928.0
2 10212764634169928.0
a.astype('float32')
counts
0 10212764362473472.0
1 10212764362473472.0
2 10212764362473472.0
You can see that using the format float32
will make an even bigger difference.