Transformation matrix

Question

Transformation matrix

I got the following small matrix with a numeric value, the matrix values can only be 0 or 1. The size of the actual matrix I am using is actually much larger, but for demonstration purposes this is fine. Its shape(8, 11)

np_array = np.matrix(
   [[0,0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,1,0,0,0,0,0],
    [0,0,0,1,0,1,0,0,0,0,0],
    [0,0,1,0,0,1,1,0,0,0,0],
    [0,0,1,0,0,0,1,0,0,0,0],
    [0,1,0,0,0,0,1,1,0,1,1],
    [0,1,0,0,0,0,0,1,0,1,0],
    [1,0,0,0,0,0,0,1,1,1,0]]
)

I need to change it so that there should only be one row with a value of 1 for each column. So if there are more rows in a column with a value of 1 for the same column, then the largest row with a value of 1 is kept and the rest is 0. Here is the result I get after:

np_array1 = np.matrix(
   [[0,0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,1,0,0,0,0,0],
    [0,0,0,0,0,0,0,0,0,0,0],
    [0,0,1,0,0,0,1,0,0,0,0],
    [0,0,0,0,0,0,0,0,0,0,0],
    [0,1,0,0,0,0,0,1,0,1,1],
    [0,0,0,0,0,0,0,0,0,0,0],
    [1,0,0,0,0,0,0,0,1,0,0]]
)

Basically, each column can have a single value of 1 if there are multiple rows, then keep the highest one. I should mention that there can also be columns where none of the rows have a value of 1. These columns should be left unchanged. The shape of the matrix must be the same as before the transformation.

+3

python numpy

RaduS 10 jul. 17 at 18:47

source to share

4 answers

You can use cumsum

to count the number of 1s you see and then select the first one:

In [42]: arr.cumsum(axis=0)
Out[42]: 
matrix([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 2, 1, 2, 0, 0, 0, 0, 0],
        [0, 0, 1, 2, 1, 3, 1, 0, 0, 0, 0],
        [0, 0, 2, 2, 1, 3, 2, 0, 0, 0, 0],
        [0, 1, 2, 2, 1, 3, 3, 1, 0, 1, 1],
        [0, 2, 2, 2, 1, 3, 3, 2, 0, 2, 1],
        [1, 2, 2, 2, 1, 3, 3, 3, 1, 3, 1]])

and therefore

In [43]: ((arr == 1) & (arr.cumsum(axis=0) == 1)).astype(int)
Out[43]: 
matrix([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])

+3

DSM 10 jul. 17 at 18:55

source to share

another approach:

for i in range(a.shape[1]):
    a[np.where(a[:,i]==1)[0][1:],i] = 0

output:

[[0 0 0 0 1 0 0 0 0 0 0]
 [0 0 0 1 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 1 0 1 1]
 [0 0 0 0 0 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 0 1 0 0]]

+1

Rayhane mama 10 jul. 17 at 18:58

source to share

You can use a non-null and unique function:

c, r = np.nonzero(np_array.T)
_, ind = np.unique(c, return_index=True)
np_array[:] = 0
np_array[r[ind], c[ind]] = 1

The example shows the result:

[[0 0 0 0 1 0 0 0 0 0 0]
 [0 0 0 1 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 1 0 1 1]
 [0 0 0 0 0 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 0 1 0 0]]

+1

Gerges dib 10 jul. 17 at 19:14

source to share

Divakar · Accepted Answer · 2017-07-10T18:55:00+0000

Here's one approach -

def per_col(a):
    idx = a.argmax(0)
    out = np.zeros_like(a)
    r = np.arange(a.shape[1])
    out[idx, r] = a[idx, r]
    return out

Examples of runs

Case # 1:

In [41]: a
Out[41]: 
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1],
       [0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0],
       [1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])

In [42]: per_col(a)
Out[42]: 
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])

Case # 2 (Insert a column of all zeros):

In [78]: a[:,1] = 0

In [79]: a
Out[79]: 
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
       [1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])

In [80]: per_col(a)
Out[80]: 
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])

If you're crazy about one layer or fan broadcasting

, here's another -

((a.argmax(0) == np.arange(a.shape[0])[:,None]).astype(int))*a.any(0)

Example run -

In [89]: a
Out[89]: 
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
       [1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])

In [90]: ((a.argmax(0) == np.arange(a.shape[0])[:,None]).astype(int))*a.any(0)
Out[90]: 
array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])

Runtime test -

In [98]: a = np.random.randint(0,2,(100,10000))

# @DSM soln
In [99]: %timeit ((a == 1) & (a.cumsum(axis=0) == 1)).astype(int)
100 loops, best of 3: 5.19 ms per loop

# Proposed in this post : soln1
In [100]: %timeit per_col(a)
100 loops, best of 3: 3.4 ms per loop

# Proposed in this post : soln2
In [101]: %timeit ((a.argmax(0) == np.arange(a.shape[0])[:,None]).astype(int))*a.any(0)
100 loops, best of 3: 7.73 ms per loop

Transformation matrix

More articles: