Pandas - Groupby with conditional formula

Question

Pandas - Groupby with conditional formula

   Survived  SibSp  Parch
0         0      1      0
1         1      1      0
2         1      0      0
3         1      1      0
4         0      0      1

Given the dataframe above, is there an elegant way groupby

with a condition? I want to split the data into two groups based on the following conditions:

(df['SibSp'] > 0) | (df['Parch'] > 0) =   New Group -"Has Family"
 (df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"

then take funds from both of these groups and get a result similar to the following:

               SurvivedMean
 Has Family    Mean
 No Family     Mean

Can this be done with groupby or do I need to add a new column using the above conditional?

+5

python pandas conditional dataframe conditional-statements pandas-groupby

George Vince 13 jul. 17 at 14:03

source to share

3 answers

Use only one condition if there are never values in columns SibSp

or Parch

less 0

:

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)

df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

If this is not possible, use both conditions first:

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family', 
    np.where(m2, 'No Family', 'Not'))

df = df.groupby(a)['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

+1

jezrael 13 jul. 17 at 14:10

source to share

You can define your conditions in the list and use the function group_by_condition

below to create a filtered list for each condition. After that, you can select the resulting elements using pattern matching:

df = [
  {"Survived": 0, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 0, "Parch": 0}]

conditions = [
  lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0),  # has family
  lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0)  # no family
]

def group_by_condition(l, conditions):
    return [[item for item in l if condition(item)] for condition in conditions]

[has_family, no_family] = group_by_condition(df, conditions)

+1

Zwackelmann 13 jul. 17 at 14:23

source to share

ayhan · Accepted Answer · 2017-07-13T14:12:58+0000

An easy way to group is to use the sum of these two columns. If any of them are positive, the result will be greater than 1. And groupby accepts an arbitrary array if the length is the same as the length of the DataFrame, so you don't need to add a new column.

family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out: 
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

Pandas - Groupby with conditional formula

More articles: