Scipy Sparse Eigensolver: MemoryError after multiple passes through a loop without anything new written during the loop

Question

Scipy Sparse Eigensolver: MemoryError after multiple passes through a loop without anything new written during the loop

I am using Python + Scipy to diagonalize sparse matrices with random inputs along the diagonal; in particular, I need eigenvalues in the middle of the spectrum. The code I wrote worked fine for months, but now I am looking at large matrices and am in a "MemoryError". What confuses / drives me crazy is that the error only appears after a few iterations (namely 9) of constructing a random matrix and diagonalizing it, but I don't see any way for my code to store something extra in memory from one iteration to the next, and so can't see how my code might fail during the ninth iteration, but not the first.

Here are the details (and I apologize in advance if I didn't leave anything, I'm new to posting on this site):

Each constructed matrix is 16000x16000, with 15x16000 nonzero entries. Everything went great when I looked at 4000x4000 matrices. The main body of my code

#Initialization
#...

for i in range(dim):
    for n in range(N):
        digit = (i % 2**(n+1)) / 2**n
        index = (i % 2**n) + ((digit + 1) % 2)*(2**n) + (i / 2**(n+1))*(2**(n+1))
        row[dim + N*i + n] = index
        col[dim + N*i + n] = i
        dat[dim + N*i + n] = -G

e_list = open(e_list_name + "_%03dk_%010ds" % (num_states, int(start_time)), "w")
e_log = open(e_log_name + "_%03dk_%010ds" % (num_states, int(start_time)), "w")

for t in range(num_itr): #Begin iterations
    dat[0:dim] = math.sqrt(N/2.0)*np.random.randn(dim) #Get new diagonal elements
    H = sparse.csr_matrix((dat, (row, col))) #Construct new matrix
    vals = sparse.linalg.eigsh(H, k = num_states + 2, sigma = target_energy, which = 'LM', return_eigenvectors = False) #Get new eigenvalues
    vals = np.sort(vals)

    vals.tofile(e_list)
    e_log.write("Iter %d complete\n" % (t+1))

    e_list.flush()
    e_log.flush()

e_list.close()
e_log.close()

I set num_itr to 100. During the 9th pass through the num_itr loop (as indicated by 8 lines written to e_log), the program crashes with an error message

Can't extend MemType 0: jcol 7438

Traceback (last call last):

   File "/usr/lusers/clb37/QREM_Energy_Gatherer.py", line 55, in <module>

           vals = sparse.linalg.eigsh(H, k = num_states + 2, sigma = target_energy, which = 'LM', return_eigenvectors = False)

   File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1524, in eigsh

           symmetric=True, tol=tol)

   File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1030, in get_OPinv_matvec

           return SpLuInv(A.tocsc()).matvec

   File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 898, in __init__

           self.M_lu = splu(M)

   File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py", line 242, in splu

           ilu=False, options=_options)

MemoryError

Of course the program will fail during the 9th pass through this loop every time I run it on my machine, and when I try to run this code on machines with more memory, the program does so with more iterations before crashing. so it looks like the computer is really running out of memory. If all of this is there, then good, but I can't figure out why the program doesn't crash during the 1st iteration. I don't see the point in 8 lines of a num_itr loop that writes something to memory without being overwritten during the next iteration. I used the Heapy heap () function to look at my memory usage, and it just prints "Total size = 11,715,240 bytes" during each pass.

I feel like there is something fundamental that I just don't know about what's going on here, or some mistake in my email that I don't know to look for, or some detail about how memory is handled. Can anyone explain to me why this code fails during the 9th pass through the num_itr loop, but not the first one?

+3

python scipy memory

Chris 03 Sep 14 at 19:57

source to share

1 answer

pv. · Accepted Answer · 2014-09-03T22:23:59+0000

Ok, this seems to be reproducible on Scipy 0.14.0.

Apparently it could be related to the problem by adding

import gc; gc.collect()

inside a loop to force Pythons' circular garbage collector to run.

The problem is that somewhere inside scipy.sparse.eigh

there is a circular reference loop, in the vein:

class Foo(object):
    pass
a = Foo()
b = Foo()
a.spam = b
b.spam = a
del a, b   # <- but a, b still refer to each other and are not dead

This is still great in principle: while Python reference counting does not detect such circular garbage, the collection is periodically run to collect such objects. However, if each object is very large in memory (such as large Numpy arrays), periodic runs are too rare and you run out of memory before the next circular garbage collection cycle is executed.

So, a workaround is to get the GC to work when you know there is a lot of garbage to collect. The best workaround would be to modify scipy.sparse.eigh so that such circular garbage is not generated in the first place.

Scipy Sparse Eigensolver: MemoryError after multiple passes through a loop without anything new written during the loop

More articles: