Get row and column names (argmax) for maximum record in pandas dataframe

df.idxmax () returns max along the axis (row or column), but I want arg_max (df) over a complete dataframe that returns a tuple (row, column).

The use case I have in mind is a function selection where I have a correlation matrix and want to "recursively" remove the functions with the highest correlation. I preprocess the correlation matrix to look at its absolute values ​​and set the diagonal entries to -1. Then I suggest using rec_drop, which recursively removes one of the pair of functions that has the highest correlation (with cutoff: max_allowed_correlation) and returns the final list of functions. For example:

S = S.abs()
np.fill_diagonal(S.values,-1) # so that max can't be on the diagonal now
S = rec_drop(S,max_allowed_correlation=0.95)

def rec_drop(S, max_allowed_correlation=0.99):
    max_corr = S.max().max()
    if max_corr<max_allowed_correlation: # base case for recursion
         return S.columns.tolist() 
    row,col = arg_max(S)  # row and col are distinct features - max can't be on the diagonal
    S = S.drop(row).drop(row,axis=1) # removing one of the features from S
    return rec_drop(S, max_allowed_correlation)

      

+3


source to share


1 answer


Assuming your entire pandas table is numeric, something you can do is converted to its numpy interpretation and fetch the maximum places from it. However numpy argmax

works with flat data, so you will need to work:

# Synthetic data
>>> table = pd.DataFrame(np.random.rand(5,3))
>>> table
          0         1         2
0  0.367720  0.235935  0.278112
1  0.645146  0.187421  0.324257
2  0.644926  0.861077  0.460296
3  0.035064  0.369187  0.165278
4  0.270208  0.782411  0.690871

[5 rows x 3 columns

      



Convert table to numeric data and calculate argmax:

>>> data = table.as_matrix()
>>> amax = data.argmax() # 7 in this case
>>> row, col = (amax//data.shape[1], amax%data.shape[1])
>>> row, col
(2, 1)

      

+2


source







All Articles