Identify vectors with the same value in the same column with numpy in python
I have a large 2d array of vectors. I want to split this array into multiple arrays according to one of the elements or sizes of the vectors. I would like to get one such small array if the values along this column are consistently identical. For example, given the third dimension or column:
orig = np.array([[1, 2, 3],
[3, 4, 3],
[5, 6, 4],
[7, 8, 4],
[9, 0, 4],
[8, 7, 3],
[6, 5, 3]])
I want to turn into three arrays consisting of rows 1,2 and 3,4,5 and 6,7:
>>> a
array([[1, 2, 3],
[3, 4, 3]])
>>> b
array([[5, 6, 4],
[7, 8, 4],
[9, 0, 4]])
>>> c
array([[8, 7, 3],
[6, 5, 3]])
I am new to python and numpy. Any help would be greatly appreciated.
Relationship Mat
Edit: I've reformatted the arrays to clarify the problem.
source to share
Nothing fancy here, but this nice old fashioned loop should do the trick
import numpy as np
a = np.array([[1, 2, 3],
[1, 2, 3],
[1, 2, 4],
[1, 2, 4],
[1, 2, 4],
[1, 2, 3],
[1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
if row[-1] == prev:
rows = np.vstack((rows, row))
else:
groups.append(rows)
rows = [row]
prev = row[-1]
groups.append(rows)
print groups
## [array([[1, 2, 3],
## [1, 2, 3]]),
## array([[1, 2, 4],
## [1, 2, 4],
## [1, 2, 4]]),
## array([[1, 2, 3],
## [1, 2, 3]])]
source to share
if a
it looks like this:
array([[1, 1, 2, 3],
[2, 1, 2, 3],
[3, 1, 2, 4],
[4, 1, 2, 4],
[5, 1, 2, 4],
[6, 1, 2, 3],
[7, 1, 2, 3]])
than this
col = a[:, -1] indices = np.where(col[:-1] != col[1:])[0] + 1 indices = np.concatenate(([0], indices, [len(a)])) res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])] print(res)
leads to:
[array([[1, 2, 3],
[1, 2, 3]]), array([[1, 2, 4],
[1, 2, 4],
[1, 2, 4]]), array([[1, 2, 3],
[1, 2, 3]])]
Update: np.split()
much nicer. No need to add the first and last index:
col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)
source to share