How to compare strings in a dataframe in pandas

Question

How to compare strings in a dataframe in pandas

I would like to be able to compare 2 strings where the ID numbers are the same (e.g. rows 0 and 1) and then delete the row with less absolute income. Is there a way I can do this using only pandas functions and not looping through strings using .itertuples (). I've been thinking about using .shift and .apply, but I'm not sure how to accomplish.

 Index   ID             Income  
 0       2011000070      55019   
 1       2011000070          0   
 2       2011000074      23879   
 3       2011000074          0   
 4       2011000078          0   
 5       2011000078          0   
 6       2011000118     -32500   
 7       2011000118          0

I want to:

 Index   ID             Income  
 0       2011000070      55019     
 2       2011000074      23879     
 4       2011000078          0     
 6       2011000118     -32500

+3

python pandas

stav 10 jul. 17 at 17:35

source to share

3 answers

Using pandas.DataFrame.drop_duplicates

plus sorting by ID

, and absolute value Income

should solve your problem. Its default parameter keep

is "first"

, which is what you need.

df['Income_abs'] = df['Income'].apply(abs)

df.sort_values(['ID', 'Income_abs'], ascending=[True,False]).drop_duplicates(['ID']).drop('Income_abs',axis=1)
Out[26]: 
   Index          ID  Income
0      0  2011000070   55019
2      2  2011000074   23879
4      4  2011000078       0
6      6  2011000118  -32500

+1

blacksite 10 jul. 17 at 17:39

source to share

It might work.

In [458]: df.groupby('ID', as_index=False).apply(lambda x: x.ix[x.Income.abs().idxmax()])
Out[458]:
   Index          ID  Income
0      0  2011000070   55019
1      2  2011000074   23879
2      4  2011000078       0
3      6  2011000118  -32500

+1

Zero 10 jul. 17 at 17:41

source to share

jezrael · Accepted Answer · 2017-07-10T17:40:50+0000

You need DataFrameGroupBy.idxmax

c Series.abs

for the indices of the maximum absolute values and then select the rows loc

:

print (df.groupby('ID')['Income'].apply(lambda x: x.abs().idxmax()))
ID
2011000070    0
2011000074    2
2011000078    4
2011000118    6
Name: Income, dtype: int64

df = df.loc[df.groupby('ID')['Income'].apply(lambda x: x.abs().idxmax())]
print (df)
   Index          ID  Income
0      0  2011000070   55019
2      2  2011000074   23879
4      4  2011000078       0
6      6  2011000118  -32500

Alternative solution:

df = df.loc[df['Income'].abs().groupby(df['ID']).idxmax()]
print (df)
   Index          ID  Income
0      0  2011000070   55019
2      2  2011000074   23879
4      4  2011000078       0
6      6  2011000118  -32500

How to compare strings in a dataframe in pandas

More articles: