Repeat the scsy csr slit matrix along the 0 axis

I wanted to echo the scipy csr rows of a sparse matrix, but when I tried to call the numpy repeat method it just treats the sparse matrix as an object and will only iterate it as an object in the ndarray. I looked through the documentation, but I couldn't find any utility to iterate over scipy csr rows of a sparse matrix.

I wrote the following code that works with internal data which seems to work

def csr_repeat(csr, repeats):
    if isinstance(repeats, int):
        repeats = np.repeat(repeats, csr.shape[0])
    repeats = np.asarray(repeats)
    rnnz = np.diff(csr.indptr)
    ndata = rnnz.dot(repeats)
    if ndata == 0:
        return sparse.csr_matrix((np.sum(repeats), csr.shape[1]),
                                 dtype=csr.dtype)
    indmap = np.ones(ndata, dtype=np.int)
    indmap[0] = 0
    rnnz_ = np.repeat(rnnz, repeats)
    indptr_ = rnnz_.cumsum()
    mask = indptr_ < ndata
    indmap -= np.int_(np.bincount(indptr_[mask],
                                  weights=rnnz_[mask],
                                  minlength=ndata))
    jumps = (rnnz * repeats).cumsum()
    mask = jumps < ndata
    indmap += np.int_(np.bincount(jumps[mask],
                                  weights=rnnz[mask],
                                  minlength=ndata))
    indmap = indmap.cumsum()
    return sparse.csr_matrix((csr.data[indmap],
                              csr.indices[indmap],
                              np.r_[0, indptr_]),
                             shape=(np.sum(repeats), csr.shape[1]))

      

and be efficient enough, but I'd rather not decapitate the class. Is there a better way to do this?

+3


source to share


1 answer


No wonder it np.repeat

doesn't work. It delegates the action to a hardcoded method a.repeat

, and, otherwise, first turns a

into an array (an object if needed).

In the world of linear algebra, which was developed by the sparse code, most of the assembly work was carried out in arrays row

, col

, data

before you create a sparse matrix. The main focus was on efficient math operations, not so much adding / removing / indexing rows and elements.

I haven't worked on your code, but I'm not surprised the format matrix csr

requires this kind of work.

I have developed a similar function for the format lil

(works from lil.copy

):

def lil_repeat(S, repeat):
    # row repeat for lil sparse matrix
    # test for lil type and/or convert
    shape=list(S.shape)
    if isinstance(repeat, int):
        shape[0]=shape[0]*repeat
    else:
        shape[0]=sum(repeat)
    shape = tuple(shape)
    new = sparse.lil_matrix(shape, dtype=S.dtype)
    new.data = S.data.repeat(repeat) # flat repeat
    new.rows = S.rows.repeat(repeat)
    return new

      



But you can also repeat the use of indexes. Both indexes lil

and csr

support indexing close to that of regular numpy arrays (at least in newer versions). Thus:

S = sparse.lil_matrix([[0,1,2],[0,0,0],[1,0,0]])
print S.A.repeat([1,2,3], axis=0)
print S.A[(0,1,1,2,2,2),:]
print lil_repeat(S,[1,2,3]).A
print S[(0,1,1,2,2,2),:].A

      

give the same result

and best of all?

print S[np.arange(3).repeat([1,2,3]),:].A

      

+2


source







All Articles