Repeat the scsy csr slit matrix along the 0 axis
I wanted to echo the scipy csr rows of a sparse matrix, but when I tried to call the numpy repeat method it just treats the sparse matrix as an object and will only iterate it as an object in the ndarray. I looked through the documentation, but I couldn't find any utility to iterate over scipy csr rows of a sparse matrix.
I wrote the following code that works with internal data which seems to work
def csr_repeat(csr, repeats):
if isinstance(repeats, int):
repeats = np.repeat(repeats, csr.shape[0])
repeats = np.asarray(repeats)
rnnz = np.diff(csr.indptr)
ndata = rnnz.dot(repeats)
if ndata == 0:
return sparse.csr_matrix((np.sum(repeats), csr.shape[1]),
dtype=csr.dtype)
indmap = np.ones(ndata, dtype=np.int)
indmap[0] = 0
rnnz_ = np.repeat(rnnz, repeats)
indptr_ = rnnz_.cumsum()
mask = indptr_ < ndata
indmap -= np.int_(np.bincount(indptr_[mask],
weights=rnnz_[mask],
minlength=ndata))
jumps = (rnnz * repeats).cumsum()
mask = jumps < ndata
indmap += np.int_(np.bincount(jumps[mask],
weights=rnnz[mask],
minlength=ndata))
indmap = indmap.cumsum()
return sparse.csr_matrix((csr.data[indmap],
csr.indices[indmap],
np.r_[0, indptr_]),
shape=(np.sum(repeats), csr.shape[1]))
and be efficient enough, but I'd rather not decapitate the class. Is there a better way to do this?
source to share
No wonder it np.repeat
doesn't work. It delegates the action to a hardcoded method a.repeat
, and, otherwise, first turns a
into an array (an object if needed).
In the world of linear algebra, which was developed by the sparse code, most of the assembly work was carried out in arrays row
, col
, data
before you create a sparse matrix. The main focus was on efficient math operations, not so much adding / removing / indexing rows and elements.
I haven't worked on your code, but I'm not surprised the format matrix csr
requires this kind of work.
I have developed a similar function for the format lil
(works from lil.copy
):
def lil_repeat(S, repeat):
# row repeat for lil sparse matrix
# test for lil type and/or convert
shape=list(S.shape)
if isinstance(repeat, int):
shape[0]=shape[0]*repeat
else:
shape[0]=sum(repeat)
shape = tuple(shape)
new = sparse.lil_matrix(shape, dtype=S.dtype)
new.data = S.data.repeat(repeat) # flat repeat
new.rows = S.rows.repeat(repeat)
return new
But you can also repeat the use of indexes. Both indexes lil
and csr
support indexing close to that of regular numpy arrays (at least in newer versions). Thus:
S = sparse.lil_matrix([[0,1,2],[0,0,0],[1,0,0]])
print S.A.repeat([1,2,3], axis=0)
print S.A[(0,1,1,2,2,2),:]
print lil_repeat(S,[1,2,3]).A
print S[(0,1,1,2,2,2),:].A
give the same result
and best of all?
print S[np.arange(3).repeat([1,2,3]),:].A
source to share