How can I eliminate zeros in a sparse matrix in (Python)?
I need a sparse matrix (I use the compressed sparse row format (CSR) from scipy.sparse
) to do some calculations. I have it as a tuple (data, (row, col))
. Unfortunately, some rows and columns will be zero, and I would like to get rid of those zeros. Right now I have:
[In]:
from scipy.sparse import csr_matrix
aa = csr_matrix((1,2,3), ((0,2,2), (0,1,2))
aa.todense()
[Out]:
matrix([[1, 0, 0],
[0, 0, 0],
[0, 2, 3]], dtype=int64)
And I would like:
[Out]:
matrix([[1, 0, 0],
[0, 2, 3]], dtype=int64)
After using the method eliminate_zeros()
for the object I get None
:
[In]:
aa2 = csr_matrix.eliminate_zeros(aa)
type(aa2)
[Out]:
<class 'NoneType'>
Why does this method turn it to None?
Is there some other way to get a sparse matrix (not necessarily CSR) and get rid of empty rows / columns easily?
I am using Python 3.4.0.
source to share
In CSR format, it is relatively easy to get rid of all null lines:
>>> import scipy.sparse as sps
>>> a = sps.csr_matrix([[1, 0, 0], [0, 0, 0], [0, 2, 3]])
>>> a.indptr
array([0, 1, 1, 3])
>>> mask = np.concatenate(([True], a.indptr[1:] != a.indptr[:-1]))
>>> mask # 1st occurrence of unique a.indptr entries
array([ True, True, False, True], dtype=bool)
>>> sps.csr_matrix((a.data, a.indices, a.indptr[mask])).A
array([[1, 0, 0],
[0, 2, 3]])
Then you can convert your sparse array to CSC format and the same trick will get rid of all null columns.
I'm not sure how well it will work, but a much more readable syntax:
>>> a[a.getnnz(axis=1) != 0][:, a.getnnz(axis=0) != 0].A
array([[1, 0, 0],
[0, 2, 3]])
also works.
source to share