Moving a numpy array

I need to parse the time positions of two objects and I am getting the data in a numpy array:

data = [[0, 1, 2],
        [1, 4, 3],
        [2, 2, 1]]

      

so the first column is the position, the second is the time point A at that particular position, and the last time of the column where point B was at that position. It is guaranteed that the data is consistent, that is, if any two strings have the same times - they have the same position in the pseudocode:

data[row1,1] == data[row2,1]  <=>  data[row1,0] == data[row2,0]
data[row1,2] == data[row2,2]  <=>  data[row1,0] == data[row2,0]

      

I would really like to somehow rewrite this array so that it lists all available times and corresponding positions, for example:

parsed = [[1, 0, 2],
          [2, 2, 0],
          [3, np.nan, 1],
          [4, 1, np.nan]]

      

Here the first column is the time, the second is the position of point A, and the third is the position of point B. np.nan should be assigned when I have no information about the position of the point. What I am currently doing is to split the data array into two separate arrays:

    moments = set (data [:, 1:3].flatten())

    for each in moments:
        a = data[:,[1,0]][pos[:,1] == each]
        b = data[:,[2,0]][pos[:,2] == each]

      

and I'll put it together again, as done in here, John Galt. ... This works somehow, but I really hope there might be something like a better solution. Can anyone attack me in the right direction?

+3


source to share


2 answers


Here's one approach using NumPy array initialization and assignment -

# Gather a and b indices. Get their union, that represents all posssible indices
a_idx = data[:,1]
b_idx = data[:,2]
all_idx = np.union1d(a_idx, b_idx)

# Setup o/p array 
out = np.full((all_idx.size,3),np.nan)

# Assign all indices to first col
out[:,0] = all_idx

# Determine the positions of a indices in all indices and assign first col data
out[np.searchsorted(all_idx, a_idx),1] = data[:,0]
# Similarly for b
out[np.searchsorted(all_idx, b_idx),2] = data[:,0]

      

np.searchsorted

acts like a godsend here as it gives us places where we need to put a

and b

from data

in an already sorted array all_idx

and is known to be quite efficient.



The output for sample data is

In [104]: out
Out[104]: 
array([[  1.,   0.,   2.],
       [  2.,   2.,   0.],
       [  3.,  nan,   1.],
       [  4.,   1.,  nan]])

      

+1


source


In the absence of better ideas, let me insert a one-liner. Disclaimer: It runs 100 times slower than Divakar pure Numpy's solution:



df = pd.DataFrame(data)
pd.concat([df.set_index(ix)[0] for ix in [1,2]], axis=1).reset_index().values
#array([[  1.,   0.,   2.],
#       [  2.,   2.,   0.],
#       [  3.,  nan,   1.],
#       [  4.,   1.,  nan]])

      

0


source







All Articles