Replace specific values in a matrix with Python

Question

Replace specific values in a matrix with Python

I have an mxn matrix where each row is a pattern and each column is a class. Each line contains the soft max probabilities for each class. I want to replace the maximum value on each line 1 and others with 0. How can I do this efficiently in Python?

+3

python numpy sparse-matrix machine-learning classification

Matrix 04 oct. 14 at 22:47

source to share

3 answers

Some compiled data:

>>> a = np.random.rand(5, 5)
>>> a
array([[ 0.06922196,  0.66444783,  0.2582146 ,  0.03886282,  0.75403153],
       [ 0.74530361,  0.36357237,  0.3689877 ,  0.71927017,  0.55944165],
       [ 0.84674582,  0.2834574 ,  0.11472191,  0.29572721,  0.03846353],
       [ 0.10322931,  0.90932896,  0.03913152,  0.50660894,  0.45083403],
       [ 0.55196367,  0.92418942,  0.38171512,  0.01016748,  0.04845774]])

In one line:

>>> (a == a.max(axis=1)[:, None]).astype(int)
array([[0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 1, 0, 0, 0]])

More efficient (and verbose) approach:

>>> b = np.zeros_like(a, dtype=int)
>>> b[np.arange(a.shape[0]), np.argmax(a, axis=1)] = 1
>>> b
array([[0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 1, 0, 0, 0]])

+2

Jaime 04 oct. 14 at 23:48

source to share

This approach, using the core numpy and list functions, works, but is least performant. I am leaving this answer here as it might be somewhat instructive. First, we create a numpy matrix:

matrix = np.matrix(np.random.randn(2,2))

matrix

, eg:

matrix([[-0.84558168,  0.08836042],
        [-0.01963479,  0.35331933]])

Now map 1 to the new matrix if the element is max, 0 otherwise:

newmatrix = np.matrix([[1 if i == row.max() else 0 for i in row] 
                                                   for row in np.array(matrix)])

newmatrix

Now:

matrix([[0, 1],
        [0, 1]])

0

Aaron hall 04 oct. At 11:11 pm

source to share

Aaron hall · Accepted Answer · 2014-10-05T01:51:08+0000

I think the best answer to your specific question is to use a matrix type object.

A sparse matrix should be the most efficient in terms of storing large numbers of these large matrices in a memory friendly way, given that most of the matrix is filled with zeros. This should be better than using numpy arrays, especially for very large matrices in both dimensions, if not in terms of computation speed, in terms of memory.

import numpy as np
import scipy       #older versions may require `import scipy.sparse`

matrix = np.matrix(np.random.randn(10, 5))
maxes = matrix.argmax(axis=1).A1           
                      # was .A[:,0], slightly faster, but .A1 seems more readable
n_rows = len(matrix)  # could do matrix.shape[0], but that slower
data = np.ones(n_rows)
row = np.arange(n_rows)
sparse_matrix = scipy.sparse.coo_matrix((data, (row, maxes)), 
                                        shape=matrix.shape, 
                                        dtype=np.int8)

This sparse_matrix object should be very lightweight relative to a regular matrix object that would uselessly keep track of every zero in it. To materialize it as a normal matrix:

sparse_matrix.todense()

returns:

matrix([[0, 0, 0, 0, 1],
        [0, 0, 1, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 0, 0, 0, 1],
        [1, 0, 0, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 0, 0, 1, 0],
        [0, 1, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [0, 0, 0, 1, 0]], dtype=int8)

What can we compare with matrix

:

matrix([[ 1.41049496,  0.24737968, -0.70849012,  0.24794031,  1.9231408 ],
        [-0.08323096, -0.32134873,  2.14154425, -1.30430663,  0.64934781],
        [ 0.56249379,  0.07851507,  0.63024234, -0.38683508, -1.75887624],
        [-0.41063182,  0.15657594,  0.11175805,  0.37646245,  1.58261556],
        [ 1.10421356, -0.26151637,  0.64442885, -1.23544526, -0.91119517],
        [ 0.51384883,  1.5901419 ,  1.92496778, -1.23541699,  1.00231508],
        [-2.42759787, -0.23592018, -0.33534536,  0.17577329, -1.14793293],
        [-0.06051458,  1.24004714,  1.23588228, -0.11727146, -0.02627196],
        [ 1.66071534, -0.07734444,  1.40305686, -1.02098911, -1.10752638],
        [ 0.12466003, -1.60874191,  1.81127175,  2.26257234, -1.26008476]])

Replace specific values ​​in a matrix with Python

More articles:

Replace specific values in a matrix with Python