Numpy way to make external list product

I am trying to do the normalized sum of the outer product of a 60000x100 matrix. I would like to do this using the numpy way, since my solution is limited to a python for loop in a list comprehension:

def covariance_over_time(X):
    B = np.sum(np.array([np.outer(x, x) for x in X]),axis=0)              
    B = np.true_divide(B, len(X))
    return B 

      

Beware that even this solution works, it is monotopic and therefore very slow when X has 60,000 rows and 100 columns.

I have tried other approaches as described here on StackOverflow question . The answer posted in the link works for small matrices, gives me memory after a few seconds. You know why? (Note: I have 6 TeraByte RAM, so it is highly unlikely that I have a memory issue since I can't see how memory usage is going up at all!)

+3


source to share


1 answer


You can just use matrix-multiplication

with np.dot

-

B = X.T.dot(X)

      

Then we normalize with np.true_divide(B, len(X))

.


Memory-optimized solutions

If you are still facing memory errors, we have two more options / methods.



I. Complete loop solution

We could skip the second axis (s) X

and perform matrix multiplication between each column by each column using two loops. Now X

only has columns 100

, and thus a full loopback solution will only iterate for 100X100 = 10000

times and do 60000

(number of rows in X

) sums on each iteration .

n = X.shape[1]
out = np.empty((n,n),dtype=X.dtype)
for i in range(n):
    for j in range(n):
        out[i,j] = X[:,i].dot(X[:,j])

      

II. Hybrid solution

A consists of a complete looping solution and fully vectorized, mentioned at the beginning, will use one loop that will do the matrix multiplication between each column across the entire array. This will do 60000X100=6000000

sum-reduction at each iteration.

n = X.shape[1]
out = np.empty((n,n),dtype=X.dtype)
for i in range(n):
    out[i] = X[:,i].dot(X)

      

+7


source







All Articles