.loc index of type change
If I have pandas.DataFrame
with columns of different types (like int64
and float64
), getting one item from the int
indexed column .loc
converts the output to float
:
import pandas as pd
df_test = pd.DataFrame({'ints':[1,2,3], 'floats': [4.5,5.5,6.5]})
df_test['ints'].dtype
>>> dtype('int64')
df_test.loc[0,'ints']
>>> 1.0
type(df_test.loc[0,'ints'])
>>> numpy.float64
If I use .at
for indexing it doesn't:
type(df_test.at[0,'ints'])
>>> numpy.int64
This also doesn't happen when all columns are int
:
df_test = pd.DataFrame({'ints':[1,2,3], 'ints2': [4,5,6]})
df_test.loc[0,'ints']
>>> 1
Is this a consequence of some of the basic properties of indexing pandas
? In other words, is this a function error? :)
Update . It turns out that this is a bug and will be fixed in pandas 0.20.0
.
source to share
The problem here is that it loc
implicitly tries to return Series
initially even if you are returning a single column, and hence the scalar value from that row is dtype
incremented to a dtype which will support all dtypes for that row if you select that particular column and use loc
then it won't convert this:
In [83]:
df_test['ints'].loc[0]
Out[83]:
1
You can see what happens when you don't hook:
In [84]:
df_test.loc[0]
Out[84]:
floats 4.5
ints 1.0
Name: 0, dtype: float64
This might not be desirable, and I think there might be a github issue regarding this
this issue is related
source to share