Numpy.dot & # 8594; MemoryError, my_dot & # 8594; very slow but works. What for?

I am trying to calculate the point product of two numpy arrays of size (162225, 10000) and (10000, 100) respectively. However, if I call numpy.dot (A, B), a MemoryError event occurs. Then I tried to write my implementation:

def slower_dot (A, B):
    """Low-memory implementation of dot product"""
    #Assuming A and B are of the right type and size
    R = np.empty([A.shape[0], B.shape[1]])
    for i in range(A.shape[0]):
        for j in range(B.shape[1]):
            R[i,j] = np.dot(A[i,:], B[:,j])
    return R

      

and it works fine, but of course very slow. Any idea 1) what is the reason for this behavior and 2) how could I work around / fix the problem?

I am using Python 3.4.2 (64 bit) and Numpy 1.9.1 on a 64 bit equipped computer with 16GB of Ubuntu 14.10 RAM.

+3


source to share


2 answers


I think the problem starts with the matrix A itself, since a 16225 * 10000 matrix already takes about 12GB of memory if each element is a double precision floating point number. This along with the way numpy creates temporary copies to do the dot operation will result in an error. Extra copies is what numpy uses basic BLAS operations for the point that needs matrices to be stored in a continuous C order

Check out these links if you'd like to discuss more about improving point performance.

http://wiki.scipy.org/PerformanceTips



Speed ​​up numpy.dot

https://github.com/numpy/numpy/pull/2730

+1


source


The reason you are getting a memory error is probably because numpy is trying to copy one or both arrays inside a call to dot

. For small to medium sized arrays, this is often the most efficient option, but for large arrays, you need numpy micromanagement to avoid a memory error. Your function is slower_dot

slow largely due to python function function overhead, which you suffer 162225 x 100 times. Here's one common way to deal with this situation when you want to balance memory and performance limits.

import numpy as np

def chunking_dot(big_matrix, small_matrix, chunk_size=100):
    # Make a copy if the array is not already contiguous
    small_matrix = np.ascontiguousarray(small_matrix)
    R = np.empty((big_matrix.shape[0], small_matrix.shape[1]))
    for i in range(0, R.shape[0], chunk_size):
        end = i + chunk_size
        R[i:end] = np.dot(big_matrix[i:end], small_matrix)
    return R

      



You need to choose the chunk_size that works best for your specific array sizes. Typically larger block sizes will be faster if everything is in memory.

+2


source







All Articles