Error: Truth value of series is ambiguous - Python pandas

I know this question has been asked before, however, when I try to do a statement if

and I get an error. I looked at the link but didn't help in my case. Mine dfs

is a list of DataFrames.

I am trying to do the following:

for i in dfs:
    if (i['var1'] < 3.000):
       print(i)

      

Gives the following error:

ValueError: Series truth value is ambiguous. Use the commands a.empty, a.bool (), a.item (), a.any (), or a.all ().

And I tried the following and got the same error.

for i,j in enumerate(dfs):
    if (j['var1'] < 3.000):
       print(i)

      

My var1

datatype float32

. I am not using any other operators logical

and &

or |

. In the link above, it seems to have been due to the use of boolean operators. Why am I getting ValueError

?

+3


source to share


2 answers


Here's a small demo that shows why this is happening:

In [131]: df = pd.DataFrame(np.random.randint(0,20,(5,2)), columns=list('AB'))

In [132]: df
Out[132]:
    A   B
0   3  11
1   0  16
2  16   1
3   2  11
4  18  15

In [133]: res = df['A'] > 10

In [134]: res
Out[134]:
0    False
1    False
2     True
3    False
4     True
Name: A, dtype: bool

      

when we try to check if such a series True

is - Pandas doesn't know what to do:

In [135]: if res:
     ...:     print(df)
     ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
skipped
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

      

Workarounds:



we can decide how to handle a series of booleans - for example, if

should return True

if the values ​​are all True

:

In [136]: res.all()
Out[136]: False

      

or when the value of at least one is True:

In [137]: res.any()
Out[137]: True

In [138]: if res.any():
     ...:     print(df)
     ...:
    A   B
0   3  11
1   0  16
2  16   1
3   2  11
4  18  15

      

+4


source


You are currently selecting the entire series for comparison. To get an individual value from a series, you want to use something along the lines:

for i in dfs:
if (i['var1'].iloc[0] < 3.000):
   print(i)

      

To compare each of the individual items, you can use series.iteritems (the documentation is sparse on this one):

for i in dfs:
    for _, v in i['var1'].iteritems():
        if v < 3.000:
            print(v)

      



The best solution here for most cases is to select a subset of the dataframe to use for whatever you need, for example:

for i in dfs:
    subset = i[i['var1'] < 3.000]
    # do something with the subset

      

Performance in pandas is much better on large data frames by using sequential operations instead of repeating single values. For more details, you can check the pandas selection documentation.

+1


source







All Articles