Error: Truth value of series is ambiguous - Python pandas
I know this question has been asked before, however, when I try to do a statement if
and I get an error. I looked at the link but didn't help in my case. Mine dfs
is a list of DataFrames.
I am trying to do the following:
for i in dfs:
if (i['var1'] < 3.000):
print(i)
Gives the following error:
ValueError: Series truth value is ambiguous. Use the commands a.empty, a.bool (), a.item (), a.any (), or a.all ().
And I tried the following and got the same error.
for i,j in enumerate(dfs):
if (j['var1'] < 3.000):
print(i)
My var1
datatype float32
. I am not using any other operators logical
and &
or |
. In the link above, it seems to have been due to the use of boolean operators. Why am I getting ValueError
?
source to share
Here's a small demo that shows why this is happening:
In [131]: df = pd.DataFrame(np.random.randint(0,20,(5,2)), columns=list('AB'))
In [132]: df
Out[132]:
A B
0 3 11
1 0 16
2 16 1
3 2 11
4 18 15
In [133]: res = df['A'] > 10
In [134]: res
Out[134]:
0 False
1 False
2 True
3 False
4 True
Name: A, dtype: bool
when we try to check if such a series True
is - Pandas doesn't know what to do:
In [135]: if res:
...: print(df)
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
skipped
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Workarounds:
we can decide how to handle a series of booleans - for example, if
should return True
if the values ββare all True
:
In [136]: res.all()
Out[136]: False
or when the value of at least one is True:
In [137]: res.any()
Out[137]: True
In [138]: if res.any():
...: print(df)
...:
A B
0 3 11
1 0 16
2 16 1
3 2 11
4 18 15
source to share
You are currently selecting the entire series for comparison. To get an individual value from a series, you want to use something along the lines:
for i in dfs:
if (i['var1'].iloc[0] < 3.000):
print(i)
To compare each of the individual items, you can use series.iteritems (the documentation is sparse on this one):
for i in dfs:
for _, v in i['var1'].iteritems():
if v < 3.000:
print(v)
The best solution here for most cases is to select a subset of the dataframe to use for whatever you need, for example:
for i in dfs:
subset = i[i['var1'] < 3.000]
# do something with the subset
Performance in pandas is much better on large data frames by using sequential operations instead of repeating single values. For more details, you can check the pandas selection documentation.
source to share