# A special type of row by row multiplication by two sparse matrices in Python

What I'm looking for: A way to implement in Python a special multiplication operation for matrices that end up in scipy sparse ( csr ). This is a special kind of multiplication, not matrix multiplication, but Kronecker multiplication and Hadamard aka pointwise multiplication and doesn't seem to have built-in support in scipy.sparse.

Required operation: Each line of the output must contain the results of each product of the elements of the corresponding lines in the two input matrices. Therefore, starting with two equally sized matrices, each with dimensions m by n, the result should have dimensions m by n ^ 2.

It looks like this:

Python code:

```
import scipy.sparse
A = scipy.sparse.csr_matrix(np.array([[1,2],[3,4]]))
B = scipy.sparse.csr_matrix(np.array([[0,5],[6,7]]))
# C is the desired product of A and B. It should look like:
C = scipy.sparse.csr_matrix(np.array([[0,5,0,10],[18,21,24,28]]))
```

What would be a good or efficient way to do this? I've tried looking here on stackoverflow as elsewhere until no luck. So far, I've found it best to do a row-by-row operation in a for loop, but that sounds awful since **my input matrices have several million rows and several thousand columns, mostly sparse** .

source to share

In your example `C`

, this is the first and last line`kron`

```
In [4]: A=np.array([[1,2],[3,4]])
In [5]: B=np.array([[0,5],[6,7]])
In [6]: np.kron(A,B)
Out[6]:
array([[ 0, 5, 0, 10],
[ 6, 7, 12, 14],
[ 0, 15, 0, 20],
[18, 21, 24, 28]])
In [7]: np.kron(A,B)[[0,3],:]
Out[7]:
array([[ 0, 5, 0, 10],
[18, 21, 24, 28]])
```

`kron`

contains the same values as `np.outer`

but in a different order.

For large dense arrays `einsum`

can provide good speed:

```
np.einsum('ij,ik->ijk',A,B).reshape(A.shape[0],-1)
```

`sparse.kron`

does the same as `np.kron`

:

```
As = sparse.csr_matrix(A); Bs ...
sparse.kron(As,Bs).tocsr()[[0,3],:].A
```

`sparse.kron`

is written in Python, so you can probably change it if it does unnecessary calculations.

An iterative solution looks like this:

```
sparse.vstack([sparse.kron(a,b) for a,b in zip(As,Bs)]).A
```

Being iterative, I don't expect it to be faster than parrying full `kron`

. But without going into the logic `sparse.kron`

, perhaps this is the best I can do.

`vstack`

uses `bmat`

, so the calculation is:

```
sparse.bmat([[sparse.kron(a,b)] for a,b in zip(As,Bs)])
```

But `bmat`

quite complex, so simplifying it won't be easy.

The solution `np.einsum`

cannot be easily extended to sparse - does not exist `sparse.einsum`

, and the intermediate is 3d, which is sparse not processed.

`sparse.kron`

uses a format `coo`

that is not suitable for working with strings. But working in the spirit of this function, I developed a function that iterates over rows of format matrices `csr`

. Like `kron`

and `bmat`

, I build arrays `data`

, `row`

, `col`

and build a `coo_matrix`

one of them. This, in turn, can be converted to other formats.

```
def test_iter(A, B):
m,n1 = A.shape
n2 = B.shape[1]
Cshape = (m, n1*n2)
data = np.empty((m,),dtype=object)
col = np.empty((m,),dtype=object)
row = np.empty((m,),dtype=object)
for i,(a,b) in enumerate(zip(A, B)):
data[i] = np.outer(a.data, b.data).flatten()
#col1 = a.indices * np.arange(1,a.nnz+1) # wrong when a isn't dense
col1 = a.indices * n2 # correction
col[i] = (col1[:,None]+b.indices).flatten()
row[i] = np.full((a.nnz*b.nnz,), i)
data = np.concatenate(data)
col = np.concatenate(col)
row = np.concatenate(row)
return sparse.coo_matrix((data,(row,col)),shape=Cshape)
```

With these small 2x2 matrices as well as larger ones (e.g. `A1=sparse.rand(1000,2000).tocsr()`

), this is about 3x faster than the version with `bmat`

. For large enough matrices, the dense version `einsum`

(which may have memory errors) is better .

source to share

The non-optimal way to do this is to use kron separately for each line:

```
def my_mult(A, B):
nrows = A.shape[0]
prodrows = []
for i in xrange(0, nrows):
Arow = A.getrow(i)
Brow = B.getrow(i)
prodrow = scipy.sparse.kron(Arow,Brow)
prodrows.append(prodrow)
return scipy.sparse.vstack(prodrows)
```

This is about 3x worse performance than @hpaulj's solution here , as you can see from the following code:

```
A=scipy.sparse.rand(20000,1000, density=0.05).tocsr()
B=scipy.sparse.rand(20000,1000, density=0.05).tocsr()
# Check memory
%memit C1 = test_iter(A,B)
%memit C2 = my_mult(A,B)
# Check time
%timeit C1 = test_iter(A,B)
%timeit C2 = my_mult(A,B)
# Last but not least, check correctness!
print (C1 - C2).nnz == 0
```

** Results:**

Hpaulj method:

```
peak memory: 1993.93 MiB, increment: 1883.80 MiB
1 loops, best of 3: 6.42 s per loop
```

this method:

```
peak memory: 2456.75 MiB, increment: 1558.78 MiB
1 loops, best of 3: 18.9 s per loop
```

source to share