Python Pandas Boolean Dataframe where Dataframe is False - returns 0 instead of False?

If I have Dataframe

with True

/ False

values ​​just like this:

df_mask = pd.DataFrame({'AAA': [True] * 4,
                        'BBB': [False]*4,
                        'CCC': [True, False, True, False]}); print(df_mask)

    AAA    BBB    CCC
0  True  False   True
1  True  False  False
2  True  False   True
3  True  False  False

      

Then try printing where the values ​​in are Dataframe

equivalent False

as follows:

print(df_mask[df_mask == False])
print(df_mask.where(df_mask == False))

      

My question is about a column CCC

. The column BBB

shows False

(as I expect), but why index

1

and 3

in the column CCC

is equal 0

instead False

?

   AAA    BBB  CCC
0  NaN  False  NaN
1  NaN  False    0
2  NaN  False  NaN
3  NaN  False    0
   AAA    BBB  CCC
0  NaN  False  NaN
1  NaN  False    0
2  NaN  False  NaN
3  NaN  False    0

      

Why doesn't it return Dataframe

that looks like this?

   AAA    BBB   CCC
0  NaN  False   NaN
1  NaN  False False
2  NaN  False   NaN
3  NaN  False False

      

+3


source to share


1 answer


It's not entirely clear why, but if you're looking for a quick fix for returning it to bools, you can do the following:

>>> df_bool = df_mask.where(df_mask == False).astype(bool)
>>> df_bool
    AAA    BBB    CCC
0  True  False   True
1  True  False  False
2  True  False   True
3  True  False  False

      

This is because the returned dataframe has a different dtype: it is no longer a bools dataframe.

>>> df2 = df_mask.where(df_mask == False)
>>> df2.dtypes
AAA    float64
BBB       bool
CCC    float64
dtype: object

      



It even happens if you force it from the bool dtype from getgo:

>>> df_mask = pd.DataFrame({'AAA': [True] * 4,
...                         'BBB': [False]*4,
...                         'CCC': [True, False, True, False]}, dtype=bool); print(df_mask)
    AAA    BBB    CCC
0  True  False   True
1  True  False  False
2  True  False   True
3  True  False  False
>>> df2 = df_mask.where(df_mask == False)
>>> df2
   AAA    BBB  CCC
0  NaN  False  NaN
1  NaN  False    0
2  NaN  False  NaN
3  NaN  False    0

      

If you are clearly concerned about memory, you can also just return the reference, but if you explicitly ignore the old reference (in which case it doesn't matter), be careful:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html

+1


source







All Articles