.loc index of type change

If I have pandas.DataFrame

with columns of different types (like int64

and float64

), getting one item from the int

indexed column .loc

converts the output to float

:

import pandas as pd
df_test = pd.DataFrame({'ints':[1,2,3], 'floats': [4.5,5.5,6.5]})

df_test['ints'].dtype
>>> dtype('int64')

df_test.loc[0,'ints']
>>> 1.0

type(df_test.loc[0,'ints'])
>>> numpy.float64

      

If I use .at

for indexing it doesn't:

type(df_test.at[0,'ints'])
>>> numpy.int64

      

This also doesn't happen when all columns are int

:

df_test = pd.DataFrame({'ints':[1,2,3], 'ints2': [4,5,6]})
df_test.loc[0,'ints']
>>> 1

      

Is this a consequence of some of the basic properties of indexing pandas

? In other words, is this a function error? :)

Update . It turns out that this is a bug and will be fixed in pandas 0.20.0

.

+3


source to share


1 answer


The problem here is that it loc

implicitly tries to return Series

initially even if you are returning a single column, and hence the scalar value from that row is dtype

incremented to a dtype which will support all dtypes for that row if you select that particular column and use loc

then it won't convert this:

In [83]:
df_test['ints'].loc[0]

Out[83]:
1

      

You can see what happens when you don't hook:



In [84]:
df_test.loc[0]

Out[84]:
floats    4.5
ints      1.0
Name: 0, dtype: float64

      

This might not be desirable, and I think there might be a github issue regarding this

this issue is related

+2


source







All Articles