The truth value of the series is ambiguous. Error while calling a function

I know the following error

ValueError: Series truth value is ambiguous. Use the commands a.empty, a.bool (), a.item (), a.any (), or a.all ().

was asked a long time ago.

However, I am trying to create a basic function and return a new column df['busy']

by using 1

or 0

. My function looks like this:

def hour_bus(df):
    if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
             (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
         return df['busy'] == 1
     else:
         return df['busy'] == 0 

      

I can execute the function, but when I call it using the DataFrame, I get the error mentioned above. I followed the following thread and another thread to create this function. I used &

instead and

in my proposal if

.

Anyway, when I do the following I get the desired result.

df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
                        (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')

      

Any ideas on what mistake I am making in my function hour_bus

?

+3


source to share


1 answer


(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')

      

gives a boolean array, and when you index df

yours, you will get (possibly) a smaller portion of yours df

.

Just to illustrate what I mean:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0    False
# 1    False
# 2     True
# 3     True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
#    a
# 2  3
# 3  4

      



However, it's still a DataFrame

, so it's ambiguous to use it as an expression that requires a true value (a in your case if

).

bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

      

You can use np.where

which one you used, or equivalent:

def hour_bus(df):
    mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
    res = df['busy'] == 0                             
    res[mask] = (df['busy'] == 1)[mask]  # replace the values where the mask is True
    return res

      

Would np.where

be a better solution however (more readable and probably faster).

+2


source







All Articles