Identify vectors with the same value in the same column with numpy in python

Question

Identify vectors with the same value in the same column with numpy in python

I have a large 2d array of vectors. I want to split this array into multiple arrays according to one of the elements or sizes of the vectors. I would like to get one such small array if the values along this column are consistently identical. For example, given the third dimension or column:

orig = np.array([[1, 2, 3], 
                 [3, 4, 3], 
                 [5, 6, 4], 
                 [7, 8, 4], 
                 [9, 0, 4], 
                 [8, 7, 3], 
                 [6, 5, 3]])

I want to turn into three arrays consisting of rows 1,2 and 3,4,5 and 6,7:

>>> a
array([[1, 2, 3],
       [3, 4, 3]])

>>> b
array([[5, 6, 4],
       [7, 8, 4],
       [9, 0, 4]])

>>> c
array([[8, 7, 3],
       [6, 5, 3]])

I am new to python and numpy. Any help would be greatly appreciated.

Relationship Mat

Edit: I've reformatted the arrays to clarify the problem.

+3

python arrays numpy indexing

Mathew May 12 '15 at 11:26

source to share

3 answers

Nothing fancy here, but this nice old fashioned loop should do the trick

import numpy as np

a = np.array([[1, 2, 3], 
              [1, 2, 3], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 3], 
              [1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
    if row[-1] == prev:
        rows = np.vstack((rows, row))
    else:
        groups.append(rows)
        rows = [row]
    prev = row[-1]
groups.append(rows)

print groups

## [array([[1, 2, 3],
##         [1, 2, 3]]),
##  array([[1, 2, 4],
##         [1, 2, 4],
##         [1, 2, 4]]),
##  array([[1, 2, 3],
##         [1, 2, 3]])]

0

Julien spronck 12 May '15 at 12:00

source to share

if a

it looks like this:

array([[1, 1, 2, 3],
       [2, 1, 2, 3],
       [3, 1, 2, 4],
       [4, 1, 2, 4],
       [5, 1, 2, 4],
       [6, 1, 2, 3],
       [7, 1, 2, 3]])

than this

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)

leads to:

[array([[1, 2, 3],
       [1, 2, 3]]), array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]]), array([[1, 2, 3],
       [1, 2, 3]])]

Update: np.split()

much nicer. No need to add the first and last index:

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)

0

Mike Müller May 12 '15 at 12:14

source to share

Jaime · Accepted Answer · 2015-05-12T12:12:59+0000

Using np.split

:

>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)

>>> a
array([[1, 2, 3],
       [1, 2, 3]])
>>> b
array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]])
>>> c
array([[1, 2, 3],
       [1, 2, 3]])

Identify vectors with the same value in the same column with numpy in python

More articles: