Binarize sparse matrix in python differently

Suppose I have a matrix like:

4 0 3 5
0 2 6 0
7 0 1 0

      

I want it to be binarized like:

0 0 0 0
0 1 0 0
0 0 1 0

      

Threshold is set to 2, any item above the threshold is set to 0, any item less than or equal to the threshold (except 0) is set to 1.

Can we do this in python csr_matrix or any other sparse matrix?

I know scikit-learn offers Binarizer to replace values โ€‹โ€‹below or equal to the threshold by 0, above it by 1.

+1


source to share


4 answers


If you are dealing with a sparse matrix, s

avoid inequalities that include zero, since a sparse matrix (if you use it appropriately) must have a large number of zeros and form an array of all places that are zero to be huge. So avoid s <= 2

eg. Use inequalities that choose from zero instead.

import numpy as np
from scipy import sparse

s = sparse.csr_matrix(np.array([[4, 0, 3, 5],
         [0, 2, 6, 0],
         [7, 0, 1, 0]]))

print(s)
# <3x4 sparse matrix of type '<type 'numpy.int64'>'
#   with 7 stored elements in Compressed Sparse Row format>

s[s > 2] = 0
s[s != 0] = 1

print(s.todense())

      



gives

matrix([[0, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0]])

      

+7


source


You can use numpy.where

for this:



>>> import numpy as np
>>> import scipy.sparse
>>> mat = scipy.sparse.csr_matrix(np.array([[4, 0, 3, 5],
         [0, 2, 6, 0],
         [7, 0, 1, 0]])).todense()
>>> np.where(np.logical_and(mat <= 2, mat !=0), 1, 0)
matrix([[0, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0]])

      

+3


source


There can be a very efficient way to do this, but it can be achieved with simple operations function

and list

as shown below

def binarized(matrix, threshold):
    for row in matrix:
        for each in range(len(matrix)+1):
            if row[each] > threshold:
                row[each] = 0
            elif row[each] != 0:
                row[each] = 1
    return matrix


matrix = [[4, 0, 3, 5],
          [0, 2, 6, 0],
          [7, 0, 1, 0]]

print binarized(matrix, 2)

      

Yeilds :

[[0, 0, 0, 0],
 [0, 1, 0, 0],
 [0, 0, 1, 0]]

      

0


source


import numpy as np                                                                                            

x = np.array([[4, 0, 3, 5],                                                                                   
              [0, 2, 6, 0],                                                                                   
              [7, 0, 1, 0]])                                                                                  

threshold = 2                                                                                                  
x[x<=0]=threshold+1                                                                                            
x[x<=threshold]=1                                                                                              
x[x>threshold]=0                                                                                               
print x

      

output:

[[0 0 0 0]
 [0 1 0 0]
 [0 0 1 0]]

      

0


source







All Articles