Transformation matrix
I got the following small matrix with a numeric value, the matrix values ββcan only be 0 or 1. The size of the actual matrix I am using is actually much larger, but for demonstration purposes this is fine. Its shape(8, 11)
np_array = np.matrix(
[[0,0,0,0,1,0,0,0,0,0,0],
[0,0,0,1,0,1,0,0,0,0,0],
[0,0,0,1,0,1,0,0,0,0,0],
[0,0,1,0,0,1,1,0,0,0,0],
[0,0,1,0,0,0,1,0,0,0,0],
[0,1,0,0,0,0,1,1,0,1,1],
[0,1,0,0,0,0,0,1,0,1,0],
[1,0,0,0,0,0,0,1,1,1,0]]
)
I need to change it so that there should only be one row with a value of 1 for each column. So if there are more rows in a column with a value of 1 for the same column, then the largest row with a value of 1 is kept and the rest is 0. Here is the result I get after:
np_array1 = np.matrix(
[[0,0,0,0,1,0,0,0,0,0,0],
[0,0,0,1,0,1,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0],
[0,0,1,0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0],
[0,1,0,0,0,0,0,1,0,1,1],
[0,0,0,0,0,0,0,0,0,0,0],
[1,0,0,0,0,0,0,0,1,0,0]]
)
Basically, each column can have a single value of 1 if there are multiple rows, then keep the highest one. I should mention that there can also be columns where none of the rows have a value of 1. These columns should be left unchanged. The shape of the matrix must be the same as before the transformation.
source to share
Here's one approach -
def per_col(a):
idx = a.argmax(0)
out = np.zeros_like(a)
r = np.arange(a.shape[1])
out[idx, r] = a[idx, r]
return out
Examples of runs
Case # 1:
In [41]: a
Out[41]:
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1],
[0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])
In [42]: per_col(a)
Out[42]:
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
Case # 2 (Insert a column of all zeros):
In [78]: a[:,1] = 0
In [79]: a
Out[79]:
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])
In [80]: per_col(a)
Out[80]:
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
If you're crazy about one layer or fan broadcasting
, here's another -
((a.argmax(0) == np.arange(a.shape[0])[:,None]).astype(int))*a.any(0)
Example run -
In [89]: a
Out[89]:
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])
In [90]: ((a.argmax(0) == np.arange(a.shape[0])[:,None]).astype(int))*a.any(0)
Out[90]:
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
Runtime test -
In [98]: a = np.random.randint(0,2,(100,10000))
# @DSM soln
In [99]: %timeit ((a == 1) & (a.cumsum(axis=0) == 1)).astype(int)
100 loops, best of 3: 5.19 ms per loop
# Proposed in this post : soln1
In [100]: %timeit per_col(a)
100 loops, best of 3: 3.4 ms per loop
# Proposed in this post : soln2
In [101]: %timeit ((a.argmax(0) == np.arange(a.shape[0])[:,None]).astype(int))*a.any(0)
100 loops, best of 3: 7.73 ms per loop
source to share
You can use cumsum
to count the number of 1s you see and then select the first one:
In [42]: arr.cumsum(axis=0)
Out[42]:
matrix([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 2, 1, 2, 0, 0, 0, 0, 0],
[0, 0, 1, 2, 1, 3, 1, 0, 0, 0, 0],
[0, 0, 2, 2, 1, 3, 2, 0, 0, 0, 0],
[0, 1, 2, 2, 1, 3, 3, 1, 0, 1, 1],
[0, 2, 2, 2, 1, 3, 3, 2, 0, 2, 1],
[1, 2, 2, 2, 1, 3, 3, 3, 1, 3, 1]])
and therefore
In [43]: ((arr == 1) & (arr.cumsum(axis=0) == 1)).astype(int)
Out[43]:
matrix([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
source to share
another approach:
for i in range(a.shape[1]):
a[np.where(a[:,i]==1)[0][1:],i] = 0
output:
[[0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 1 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 1 0 1 1]
[0 0 0 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 1 0 0]]
source to share
You can use a non-null and unique function:
c, r = np.nonzero(np_array.T)
_, ind = np.unique(c, return_index=True)
np_array[:] = 0
np_array[r[ind], c[ind]] = 1
The example shows the result:
[[0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 1 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 1 0 1 1]
[0 0 0 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 1 0 0]]
source to share