Identify vectors with the same value in the same column with numpy in python

I have a large 2d array of vectors. I want to split this array into multiple arrays according to one of the elements or sizes of the vectors. I would like to get one such small array if the values ​​along this column are consistently identical. For example, given the third dimension or column:

orig = np.array([[1, 2, 3], 
                 [3, 4, 3], 
                 [5, 6, 4], 
                 [7, 8, 4], 
                 [9, 0, 4], 
                 [8, 7, 3], 
                 [6, 5, 3]])

      

I want to turn into three arrays consisting of rows 1,2 and 3,4,5 and 6,7:

>>> a
array([[1, 2, 3],
       [3, 4, 3]])

>>> b
array([[5, 6, 4],
       [7, 8, 4],
       [9, 0, 4]])

>>> c
array([[8, 7, 3],
       [6, 5, 3]])

      

I am new to python and numpy. Any help would be greatly appreciated.

Relationship Mat

Edit: I've reformatted the arrays to clarify the problem.

+3


source to share


3 answers


Using np.split

:



>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)

>>> a
array([[1, 2, 3],
       [1, 2, 3]])
>>> b
array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]])
>>> c
array([[1, 2, 3],
       [1, 2, 3]])

      

+7


source


Nothing fancy here, but this nice old fashioned loop should do the trick



import numpy as np

a = np.array([[1, 2, 3], 
              [1, 2, 3], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 3], 
              [1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
    if row[-1] == prev:
        rows = np.vstack((rows, row))
    else:
        groups.append(rows)
        rows = [row]
    prev = row[-1]
groups.append(rows)

print groups

## [array([[1, 2, 3],
##         [1, 2, 3]]),
##  array([[1, 2, 4],
##         [1, 2, 4],
##         [1, 2, 4]]),
##  array([[1, 2, 3],
##         [1, 2, 3]])]

      

0


source


if a

it looks like this:

array([[1, 1, 2, 3],
       [2, 1, 2, 3],
       [3, 1, 2, 4],
       [4, 1, 2, 4],
       [5, 1, 2, 4],
       [6, 1, 2, 3],
       [7, 1, 2, 3]])

      

than this

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)

      

leads to:

[array([[1, 2, 3],
       [1, 2, 3]]), array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]]), array([[1, 2, 3],
       [1, 2, 3]])]

      

Update: np.split()

much nicer. No need to add the first and last index:

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)

      

0


source







All Articles