Why doesn't "in" look for values ​​when the series contains strings

This is probably very simple, but why does in

n't it seem to work for Series

containing objects or strings?

>>> import pandas as pd

>>> s = pd.Series(['a', 'b', 'c'])
>>> 'a' in s
False
>>> 'a' in s.astype('S1')
False

      

The documentation is Series.__contains__

pretty sparse:

[In 1]: s.__contains__?
Signature: s.__contains__(key)
Docstring: True if the key is in the info axis
File:      c:\...\lib\site-packages\pandas\core\generic.py
Type:      method

      

My first thought was that in

it only checks the "index":

>>> 1 in s
True

      

But then: why (seems to) work with other types:

>>> 1.2 in pd.Series([1.3, 1.2])
True

>>> 1 in pd.Series([1.3, 1.2])  # also works for index
True

      


I don't need working solutions. I know I can just use whatever in s.values

or np.any(s.eq(whatever))

. I would like to know why it behaves this way (or am I missing something?).

+3


source to share


1 answer


It behaves like this because the series is more like an OrderedDict than a list.

Just like 1 in {0: 5, 1: 10}

- True, so this 1 in pd.Series([5, 10])

is because the index RangeIndex(start=0, stop=2, step=1)

, and the items in the index are like keys.

I can see why the case

>>> 1.2 in pd.Series([1.3, 1.2])
True

      

might be a little confusing, but it's just a coincidence based on the numbers you chose - 1.2 is coerced into int before comparing to RangeIndex or Int64Index, so you're really asking 1 in ser.index

, I personally don't like this behavior, but that's what it does.



>>> 1.9 in pd.Series([1.3, 1.2])
True
>>> 1.2 in pd.Series([1.3, 1.2], index=[10, 20])
False

      


To make the duress even more obvious:

In [54]: np.inf in pd.Series([1.3, 1.2])
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-54-b069ecc5baf6> in <module>()
----> 1 np.inf in pd.Series([1.3, 1.2])

[...]
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.__contains__ (pandas/_libs/index.c:3924)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.__contains__ (pandas/_libs/hashtable.c:13569)()

OverflowError: cannot convert float infinity to integer

      

+3


source







All Articles