Combining two large two-dimensional arrays

I have two large two dimensional arrays. One form is X1 (1877055, 1299), the other is X2 (1877055, 1445). Then I use

X = np.hstack((X1, X2))

      

to combine the two arrays into a larger array. However, the program won't start or exit with a -9 code. There were no error messages in it.

What is the problem? How can I concatenate two large two dimensional arrays with two dimensions?

+3


source to share


2 answers


If there is something wrong with your NumPy build or your OS (both of which are unlikely), it is almost certainly a memory error.

For example, let all these values ​​be float64

. So you have already allocated at least 18 GB and 20 GB for these two arrays, and now you are trying to allocate another 38 GB for the concatenated array. But you have, say, 64GB of RAM plus 2GB of swap. Thus, there is not enough space to accommodate another 38 GB. On some platforms, this allocation just fails and hopefully NumPy will just catch and pick up MemoryError

. On other platforms, allocation may succeed, but once you try to actually touch all of that memory, you might perform a segfault (see overcommit handling on Linux). On other platforms, the system will try to auto-expand the swap, but then if you lose disk space, it will segfault.



Whatever the reason, if you cannot insert X1

, X2

and X

into memory at the same time, what can you do instead?

  • First, create X

    and fill in X1

    and X2

    filling out the fragmented representation X

    .
  • Record X1

    and X2

    on the drive, team up on the disk and read them back.
  • Send X1

    and X2

    to a subprocess that reads them iteratively and builds X

    and then continues.
+7


source


Not an expert in numpy, but why not use it numpy.concatenate()

?

http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html



For example:

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
   [3, 4],
   [5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
   [3, 4, 6]])

      

-3


source







All Articles