Split large numpy array into separate arrays with a list of grouped indices

Question

Split large numpy array into separate arrays with a list of grouped indices

Given 2 arrays: one for the underlying dataset and the second is a list of grouped indices that refer to the underlying dataset. I'm looking for the fastest way to generate new arrays from index data?

Here's my current solution for generating 2 arrays from a list of double keys:

# Lets make a large point cloud with 1 million entries and a list of random paired indices
import numpy as np
COUNT = 1000000
POINT_CLOUD = np.random.rand(COUNT,3) * 100
INDICES = (np.random.rand(COUNT,2)*COUNT).astype(int)  # (1,10),(233,12),...

# Split into sublists, np.squeeze is needed here because i don't want arrays of single elements.
LIST1 = POINT_CLOUD[np.squeeze(INDICES[:,[0]])]
LIST2 = POINT_CLOUD[np.squeeze(INDICES[:,[1]])]

This works, but it's a bit slower and it's only useful for creating 2 lists, it would be great to have a solution that could solve any size of index groups (e.g .: ((1,2,3,4) (8,4,5 , 3), ...)

so something like:

# PSEUDO CODE using quadruple keys
INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
SPLIT = POINT_CLOUD[<some pythonic magic>[INDICES]]
SPLIT[0] = np.array([points from INDEX #1])
SPLIT[1] = np.array([points from INDEX #2])
SPLIT[2] = np.array([points from INDEX #3])
SPLIT[3] = np.array([points from INDEX #4])

0

python-2.7 numpy

Fnord 30 jul. '15 at 3:15

source to share

1 answer

YXD · Accepted Answer · 2015-07-30T09:58:22+0000

You just need to change the index matrix:

>>> result = POINT_CLOUD[INDICES.T]
>>> np.allclose(result[0], LIST1)
True
>>> np.allclose(result[1], LIST2)
True

If you know the number of subranges, you can also unpack the list

>>> result.shape
(2, 1000000, 3)
>>> L1, L2 = result
>>> np.allclose(L1, LIST1)
True
>>> # etc

This works for large groups of indexes. For the second example in your question:

>>> INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
>>> SPLIT = POINT_CLOUD[INDICES.T]
>>> SPLIT.shape
(4, 1000000, 3)
>>>

Split large numpy array into separate arrays with a list of grouped indices

More articles: