Why doesn't "in" look for values ββwhen the series contains strings
This is probably very simple, but why does in
n't it seem to work for Series
containing objects or strings?
>>> import pandas as pd
>>> s = pd.Series(['a', 'b', 'c'])
>>> 'a' in s
False
>>> 'a' in s.astype('S1')
False
The documentation is Series.__contains__
pretty sparse:
[In 1]: s.__contains__?
Signature: s.__contains__(key)
Docstring: True if the key is in the info axis
File: c:\...\lib\site-packages\pandas\core\generic.py
Type: method
My first thought was that in
it only checks the "index":
>>> 1 in s
True
But then: why (seems to) work with other types:
>>> 1.2 in pd.Series([1.3, 1.2])
True
>>> 1 in pd.Series([1.3, 1.2]) # also works for index
True
I don't need working solutions. I know I can just use whatever in s.values
or np.any(s.eq(whatever))
. I would like to know why it behaves this way (or am I missing something?).
source to share
It behaves like this because the series is more like an OrderedDict than a list.
Just like 1 in {0: 5, 1: 10}
- True, so this 1 in pd.Series([5, 10])
is because the index RangeIndex(start=0, stop=2, step=1)
, and the items in the index are like keys.
I can see why the case
>>> 1.2 in pd.Series([1.3, 1.2])
True
might be a little confusing, but it's just a coincidence based on the numbers you chose - 1.2 is coerced into int before comparing to RangeIndex or Int64Index, so you're really asking 1 in ser.index
, I personally don't like this behavior, but that's what it does.
>>> 1.9 in pd.Series([1.3, 1.2])
True
>>> 1.2 in pd.Series([1.3, 1.2], index=[10, 20])
False
To make the duress even more obvious:
In [54]: np.inf in pd.Series([1.3, 1.2])
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-54-b069ecc5baf6> in <module>()
----> 1 np.inf in pd.Series([1.3, 1.2])
[...]
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.__contains__ (pandas/_libs/index.c:3924)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.__contains__ (pandas/_libs/hashtable.c:13569)()
OverflowError: cannot convert float infinity to integer
source to share