How can I eliminate zeros in a sparse matrix in (Python)?

Question

How can I eliminate zeros in a sparse matrix in (Python)?

I need a sparse matrix (I use the compressed sparse row format (CSR) from scipy.sparse

) to do some calculations. I have it as a tuple (data, (row, col))

. Unfortunately, some rows and columns will be zero, and I would like to get rid of those zeros. Right now I have:

[In]:
     from scipy.sparse import csr_matrix
     aa = csr_matrix((1,2,3), ((0,2,2), (0,1,2))
     aa.todense()
[Out]:
     matrix([[1, 0, 0],
             [0, 0, 0],
             [0, 2, 3]], dtype=int64)

And I would like:

[Out]:
    matrix([[1, 0, 0],
            [0, 2, 3]], dtype=int64)

After using the method eliminate_zeros()

for the object I get None

:

[In]:
     aa2 = csr_matrix.eliminate_zeros(aa)
     type(aa2)
[Out]:
     <class 'NoneType'>

Why does this method turn it to None?

Is there some other way to get a sparse matrix (not necessarily CSR) and get rid of empty rows / columns easily?

I am using Python 3.4.0.

+3

python-3.x scipy sparse-matrix

potockan 30 jul. 15 at 19:27

source to share

1 answer

Jaime · Accepted Answer · 2015-07-30T20:28:18+0000

In CSR format, it is relatively easy to get rid of all null lines:

>>> import scipy.sparse as sps
>>> a = sps.csr_matrix([[1, 0, 0], [0, 0, 0], [0, 2, 3]])
>>> a.indptr
array([0, 1, 1, 3])
>>> mask = np.concatenate(([True], a.indptr[1:] != a.indptr[:-1]))
>>> mask  # 1st occurrence of unique a.indptr entries
array([ True,  True, False,  True], dtype=bool)
>>> sps.csr_matrix((a.data, a.indices, a.indptr[mask])).A
array([[1, 0, 0],
       [0, 2, 3]])

Then you can convert your sparse array to CSC format and the same trick will get rid of all null columns.

I'm not sure how well it will work, but a much more readable syntax:

>>> a[a.getnnz(axis=1) != 0][:, a.getnnz(axis=0) != 0].A
array([[1, 0, 0],
       [0, 2, 3]])

also works.

How can I eliminate zeros in a sparse matrix in (Python)?

More articles: