The truth value of the series is ambiguous. Error while calling a function
I know the following error
ValueError: Series truth value is ambiguous. Use the commands a.empty, a.bool (), a.item (), a.any (), or a.all ().
was asked a long time ago.
However, I am trying to create a basic function and return a new column df['busy']
by using 1
or 0
. My function looks like this:
def hour_bus(df):
if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
return df['busy'] == 1
else:
return df['busy'] == 0
I can execute the function, but when I call it using the DataFrame, I get the error mentioned above. I followed the following thread and another thread to create this function. I used &
instead and
in my proposal if
.
Anyway, when I do the following I get the desired result.
df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')
Any ideas on what mistake I am making in my function hour_bus
?
source to share
(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
gives a boolean array, and when you index df
yours, you will get (possibly) a smaller portion of yours df
.
Just to illustrate what I mean:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0 False
# 1 False
# 2 True
# 3 True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
# a
# 2 3
# 3 4
However, it's still a DataFrame
, so it's ambiguous to use it as an expression that requires a true value (a in your case if
).
bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You can use np.where
which one you used, or equivalent:
def hour_bus(df):
mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
res = df['busy'] == 0
res[mask] = (df['busy'] == 1)[mask] # replace the values where the mask is True
return res
Would np.where
be a better solution however (more readable and probably faster).
source to share